Data structures

Abstract Data Structures

andAlgorithmsOverview of standard data structures

and useful algorithms

Why different data types?

Complexity of ManipulationOne criteria:

The data structure can have an effect on how difficult the task is

Vector of size n

Efficient for the nth elementA single arithmetic calculationComplexity does not increase as the vector gets bigger

O(1)

76 8 9 10

11

12

0 1 2 3 4 5

Pos(v[0]) Pos(v[0]) + 5

Find ith element of vector

This is exactly an example for what a vector is designed for

Vector of size nInsert before ith element in vector

76 8 9 10

11

12

0 1 2 3 4 5

1. Allocate vector of size n+1

2. Copy element 0 to i-1 to places 0 to i-1 and copy elements i to n-1 to places i+1 to n 76 8 9 1

011

12

0 1 2 3 4 5

1 operation

n operations

3. Set in element76 8 9 1

011

12

0 1 2 3 4 5 131 operation

Vectors are not designed to be used with insertion operationsAs the vector gets larger, the insertion takes more time/operations

O(n)

Complexity

O(1)

Time/operation complexity does not increase with the size of the problem

O(n)

Time/operation complexity does increases linearly with the size of the problem

Find ith element of vector

Insert before ith element in vector

Linked List

Element

Structure pair

Pointer to next element pair

Linked list with 6 elements

43210 5

Linked List43210 5

Find ith element of linked list

43210 5

Have to traverse structure to ith element

Linked lists are not designed to find the ith elementAs the list increases in size, the number of steps

can increaseO(n)

Linked ListInsert an element

2

43210 5

2

43210 5

Change pointers… one operation

Linked lists are exactly designed to insert an element

Regardless of the size of the list, the insertion is still one operation

Linear SearchI am thinking of a number between 1 and 10

If you just guess number (for example sequentially)

Best case: correct on the 1 guessWorse case: correct after 10 guessesOn the average it will take you 5 guesses

In general: for a number between 1 and n it will take you n/2 guessesComplexity: n/2 guesses

O(n)Don’t worry about the constant ½…

The complexity increases linearly with the size of the problem

Binary SearchFor every guess I will say whether it is correct, higher or

lower4

62

7531

Best Case: 1 guess

Worse Case:

3 guessesAt most log2 8 = 3 are needed

In general: log2 n guesses

O(log n)

Extra information:

Complexity of operations

The proper data structure can increase the efficiency of an algorithm

For structures of size n

Increases linearly with size of structure

Does not depend on size of structure

Complexity of an Algorithm

O(c) Complexity does not increase with the size of the problem

Example: Find ith element in a vector

O(n) Complexity increases linearly with the size of the problem

Example: Find ith element in a linked list

O(log n) Complexity increases with the log of the problem

Example: Binary search

As the problem grows in size, how more difficult (in terms of computation time/operations)

does the problem become

CouplingRelationship between

data structures and algorithms

Choose the wrong data structure the algorithm becomes more complex

Why different data types?

A specific object implies a data structure

Another criteria:

Graph Data StructureA set of nodes

A set of connections between the nodes

Both nodes and connections can have properties associated to them

Graph Data Structure

A graph can be a natural representation

for many data objects and processes

Social NetworkNode: The person(facebook page)

Node: connects two people who know each other(the friends of facebook page)

Each node has a list of connections(the friends of facebook page)

InheritanceDirected graph

Nodes: Data classes

Directed connection: One way connectionOne class inherits properties from the other

Connections: One class inherits the properties of the other

Object oriented classes

Ontology(labeled connectors)

Nodes: Objects

Connections: Relationship between

objects

Arithmetic Expression(functional

programming)

x + y + z * ( a + b * c)

cb*a

+z

*yx

+ function

arguments

Molecular Graph

Nodes: The atoms

Connections: The bonds between the atoms

Graph as a Linked list

4321 5

876 9

10

11

12

13

Foundation of LISP: List programmingFunctional programming(graph as an functional expression)

StacksQueues

Priority Queues

Stacks

Characteristics:Top: was the last thing

added

To get to something in the middle

You have to remove what is on top first

LIFO:Last In, First Out

Last in first out (LIFO)

DCBA

BA

topCBA

topDCBA

top EDCBA

top

topA

Push C Push D Push EPush B Pop E

Two main operations:

Push and Pop

The Towers of HanoiA Stack-based

Applicationo GIVEN: three poleso a set of discs on the first pole, discs of different sizes,

the smallest discs at the topo GOAL: move all the discs from the left pole to the

right one. o CONDITIONS: only one disc may be moved at a time. o A disc can be placed either on an empty pole or on

top of a larger disc.

Towers of Hanoi

Complexity:Towers of Hanoi

Complexity:2n

Why?To get to the bottom, you have to move all of the top object: 2(n-1)

Then you move the bottom object: 1

Then you have to move all the other objects back on top again: 2(n-1)

2(n-1) + 2(n-1) = 2 * 2(n-1) = 2n

A LegendThe Towers of Hanoi

In the great temple of Brahma in Benares, on a brass plate under the dome that marks the center of the world there are 64 disks of pure gold that the priests carry one at a time between these diamond needles

According to Brahma's immutable law: No disk may be placed on a smaller disk.

In the beginning of the world all 64 disks formed the Tower of Brahma on one needle.

Now, however, the process of transfer of the tower from one needle to another is in mid course.

When the last disk is finally in place, once again forming the Tower of Brahma but on a different needle, then will come the end of the world and all will turn to dust.

Is the End of the World Approaching?• Problem complexity 2n • 64 gold discs• Given 1 move a second

600,000,000,000 years until the end of the world

Queues

FILO:

First In and Last Out

Objects are inserted in the backAnd

Removed from the front

Queues

Computer systems must often provide a “holding area” for messages

between two processes, two programs, or even two systems.

Real time systems

Queue: Buffering

Computer sends data faster than the printer can print

Printer Buffer

Priority QueueLike a regular queue or stack datastructure, but

where additionally each element has a "priority"

associated with it.

An element with high priority is served before

an element with low priority.

If two elements have the same priority, they are served according to their order in the queue.

There is an ordering associated with the queue

Programming Paradigms

• Goto (like assembler and primitive/older languages)

• Iteration and Loops (while and for-next)• Functional languages and Recursion• Declarative• Non-deterministic programming

Example: Factorial

Implies a loop Recursive mathematical

definition

Goto statementLoops a GOTO (or similar) statement

The GOTO jumps to a specified location (label or address)

an index involved

The index is incremented until the end is reached i=1 factorial = 1;loop: factorial = factorial * I if( i=n) goto exit goto loopexit

IterationRepetition of a block of code

an index involved

The index is incremented until the end is reached

i=1 factorial = 1;

while( i <= n) { factorial = factorial * i

i = i + 1 }

Once again involves a iteration counter

factorial = 1;for i=1 to n { factorial = factorial * I }

Recursion

Numerische Mathematik 2, 312--318 (1960)

Content of Recursion• Base case(s).

o Values of the input variables for which we perform no recursive calls are called base cases (there should be at least one base case).

o Every possible chain of recursive calls must eventually reach a base case.

• Recursive calls. o Calls to the current method. o Each recursive call should be defined so that it makes

progress towards a base case.

factorial(n) {if(n=1) return 1

return factorial(n-1)*n}

How do I write a recursive function?

• Determine the size factoro The number: smaller number, smaller size

• Determine the base case(s) o The case for n=1, the answer is 1

• Determine the general case(s) o The recursive call: factorial(n)=factorial(n-1)*n

• Verify the algorithm (use the "Three-Question-Method")

factorial(n) {if(n=1) return 1

return factorial(n-1)*n}

Three-Question Verification Method

1. The Base-Case Question:Is there a nonrecursive way out of the function, and does the routine work correctly for this "base" case?

2. The Smaller-Caller Question:Does each recursive call to the function involve a smaller case of the original problem, leading inescapably to the base case?

3. The General-Case Question:Assuming that the recursive call(s) work correctly, does the whole function work correctly?

Stacksin recursion

factorial(n)If (n=1)

return 1 else

return factorial(n-1)

n! = n*(n-1)*(n-2)*(n-3)*……* 1

5! = 5*4*3*2*1

Factorial(5)Factorial(4)Factorial(3)Factorial(2)Factorial(1)return 1

Return 2Return 6Return 24Return 120

5!=120

Deep recursion can result in running out of

memory

tail recursionTail recursion is iteration

factorial(n) { factorial-help(n,1);}factorial-help(n, acc) {

if(n=1) return accreturn factorial-help(n-

1,acc*n)}

Tail recursion is a pattern of use that can be compiled or interpreted as iteration, avoiding the inefficiencies

A tail recursive function is one where every recursive call is the last thing done by the function before returning and thus produces the function’s value

Declarative programming

Expresses the logic of a computation without describing its control flow.

factorial(1,1)

factorial(N,F) :- N1 is N-1,

factorial(N1,F1),F is N*F1.

Constraint Logic Programming

factorial(1,1)


factorial(N1,F1),F is N*F1.Factorial(5,F) Returns F=120

Factorial(N.120) Creates an instantiation error

PROLOG has no knowledge of Real or Integer numbers

Mathematical manipulations cannot be made


factorial(1,1)


factorial(N1,F1),F is N*F1.

Logic Programming


CLP

Formulas passed to

CLP

Reduced or solved formulas returned

Mathmatical knowledge about the numbers used

Probabilistic Algorithms

Non-deterministicNo exact control program flow

Leaves Some of Its Decisions To ChanceOutcome of the program in different runs is not necessarily the same

Monte Carlo Methods

Always Gives an answerBut not necessarily CorrectThe probability of correctness go es up with time

Las Vegas Methods

Never returns an incorrect answerBut sometimes it doesn’t give an answer


Probabilistic Algorithms in optimization:

Closer to human reasoning and problem solving(for hard problems we don’t follow strict deterministic algorithms)

Finding local and

globalminimum


Classic gradient optimization find local minimumThe search path is always downhill toward

minimumProbabilistic algorithms allow search to go uphill sometimes

Randomness in the search for next step

Genetic AlgorithmsSimulated Annealing to find global minimum


Calculate pi with a dart board

Area of square

d2Area of Circle:

Probability dart will be in circle

d

number darts in circle

divided bynumber of darts in

totaltimesIs π

Monte Carlo MethodAlways Gives an answerBut not necessarily CorrectThe probability of correctness goes up with time

Education

Data structures