Data Structre and Algorithm-1

8/7/2019 Data Structre and Algorithm-1

1/36

Data Structure and Algorithm

Algorithm Outline, the essence of a computationalprocedure, Step-by-step instruction

P rogram Ans implementation of an algorithmin some programming languageData Structure Organization of data needed tosolve the problem.


2/36

W hat is good algorithm

E fficiencyRunning TimeSpace used

E fficiency as a function of input sizeThe number of bits as inputThe number of data elements


3/36

Analysis of Algorithms

R unning time of algorithm depends typically on

Size of input Data structure Hardware environment (clock speed, processor, memory, disk speed)

Software environment (OS, language, interpreted or compiled)

If hardware and software environment remain same, running time depends on the sizeof inputs and data structure.

W hile experimenting an algorithm with test inputTest inputs must be representativeTest must be done on same H/ w and S /w environment.


4/36

Analysis of AlgorithmsW e need an analytical framework to analyze algorithms even without actualexperimentation which can provide us

Comparison of efficiency of two algorithms Can consider all possible inputs Can be studied without actually implementing or running the algorithm

T here are various components while developing an analysis methodologies such as

A standard language for expressing the algorithmA computational model for algorithmA metric for measuring the running time of algorithm

Expressing running time in terms of input size including recursive algorithms


5/36

Limitations of Experimental Studies

It is necessary to implement and test the algorithm in order todetermine ints running time.

Experiments can be done only on some limited sets of inputs whileactually an infinite set of inputs can be created. So running time maynot be indicative as we are leaving lot of inputs out of experimentation.In order to experiment same hardware and software shold be used.


6/36

W ithout Actual ExperimentationW e should develop a general methodology for analyzingrunning time of algorithms. In this approach we will use

H igh-level description of algorithm instead of testing few of itsimplementation

Takes into account all possible inputs.

Allows one to evaluate the efficiency of algorithm in a way that is

independent of the hardware and software environment.


7/36

Language for expressing the algorithms Pseudo CodeP seudo-code

A way of expressing algorithms that uses a mixture of E nglish phrases andindention to make the steps in the solution explicit

Pseudo-code mixes standard programming language with natural language.There are no grammar rules in pseudo-code

Pseudo-code is not case sensitive

Following is the algorithm of insertion sort on a zero-based array

1. for j 1 to length(A)-12. key A[ j ]3. > A[ j ] is added in the sorted sequence A[1, .. j-1]4. i j - 15. while i >= 0 and A [ i ] > key6. A[ i +1 ] A[ i ]7. i i -1

8. A [i +1] key


8/36

Pseudo Code ConstructsP rogramming language constructs we use in pseudo-code are taken from high levellanguages from C ,C++ . These are as follows

Expression: is used for assignment and = is used for comparison.

M ethod declaration : callx(param1,param2) is a method named callx with twoinput parameter param1 and param2.

P rogramming constructs Decision structure: If then else . Indentation is

used to show the scope.W hile-loops: W hile do . H ere also indentation is used for scope.For loop: for do. Indentation is used to display the scope of for.

Repeat loop: Repeat < action> until Array indexing: A[i] represents the i-th cell in the array A.

M ethods M ethod call: object.method(). W here method belongs to object. M ethod returns: The return value of method.


9/36

Random Access Machine ( RAM) Model

A R andom Access Machine (RA M ) is a theoretical computer model which has anunlimited number of registers of unlimited size which can be accessed randomly.Then, there is a program which is run step-by-step and consists of a limited number of instructions out of a simple instruction set. The set includes very basic arithmeticoperations, jump instructions and direct and indirect access to the registers.A RA M model can perform any primitive operation in a constant number of stepswhich is independent of size of input.


10/36

Primitive Operations

To have an idea of running time of a pseudo-code without actually implementing it,we define a set of high level primitive operations that can be identified in pseudo-code and independent of implementation. W e generally consider the followingprimitive operation to estimate running time. In estimation we count the number of primitive operation of a pseudo code. From this count we can compare two pseudo-code. The primitive operations are

Value assignment to a variable. Calling a method Doing an arithmetic operation Comparison of two numbers

Array indexingObject referenceReturning from a method


11/36

Analysis of Algorithm by counting primitive operations

B y inspecting the pseudo-code , we can count the number of primitive

operations required by an algorithm.

And thus we can have an idea of the running time of the code.

P rimitive Operations

for j 2 to length(A) n-1

key A[ j ] n-1

# A[ j ] is added in the sorted sequence A[1, .. j-1]

i j - 1 n-1

while i >= 0 and A [ i ] > key Best :(n-1)*2 Worst:(n-1)(n+2)/2

A[ i +1 ] A[ i ] 0 (n)(n-1)/2i i -1 0 (n)(n-1)/2

A [i +1] key n-1


12/36

C ost times

for j 2 to n c1 n

key A[ j ] c2 n-1

# A[ j ] is added in the sorted sequence A[1, .. j-1]

i j - 1 c3 n-1

while i >= 0 and A [ i ] > key c4 t k

A[ i +1 ] A[ i ] c5 t k - 1

i i -1 c6 t k - 1

A [i +1] key c7 n-1

Best-case, Average-case and worst-case analysis

L et us do a cost analysis of Insertion sort.

n

Total Time = n(c1 + c2+ c3+ c7)+ tk (c4 + c5+ c6) (c2 + c3+ c5+ c6+ c7)

k=2


13/36

Best-case, Average-case and worst-case analysis

B est C ase for insertion sort : W hen numbers are already sorted.tk =1. Running time = f(n)W orst C ase : W hen numbers are inversely sorted. t k =k. Runningtime = f(n 2)Average C ase: t k =k /2. Running time = f(n 2).

Time vs Input Graph


14/36

Best-case, Average-case and worst-case analysisB est, worst and average cases of a given algorithm express whatthe resource usage is at least , at most and on average , respectively.

Usually the resource being considered is running time, but it couldalso be memory or other resources.In real-time computing, the worst-case execution time is often of particular concern since it is important to know how much timemight be needed in the worst case to guarantee that the algorithmwill always finish on time.Average performance and worst-case performance are the mostused in algorithm analysis. Less widely found is best-caseperformance. P robabilistic analysis techniques, especially expected

value, to determine average case analysis.Average case is often as bad as the worst case.Finding average case can be very difficult.


15/36

Analyzing Recursive Algorithm


16/36

Analyzing Recursive Algorithm

T(n) denotes the running time time of an algorithm of input size n


17/36

Asymptotic Analysis Goal :To simplify analysis of running time be getting rid of details

which may be affected by specific implementation and hardware.

Capturing the essence : H ow the running time of an algorithm withthe size of input in the limit.

)T( n) = O ( n 3 ly,Notational

"3n of order the on "roughly grows time running The

)(log3

nT nnn2nnn

dominatesitso, and , , than larger MUCH is larger, grows As

nnnnnnT 4log24213)( 23!Example:

Ignoring constants in T (n)Analyzing T (n) as n "gets large"


18/36

Asymptotic NotationDefinition (Big Oh) C onsider a function f (n) which is non-negativefor all integers n>0. W e say that `` f (n) is big oh g (n),'' which we write

f (n)=O(g (n)), if there exists an integer n 0 and a constant c> 0 such thatfor all integers , n>=n 0 , f(n)


19/36

Asymptotic Notation

B ig O notation is a huge simplification; can we justify it?

It only makes sense for large problem sizes For sufficiently large problem sizes, the highest-order termswamps all the rest!

C onsider R = x 2 + 3x + 5 as x varies:x = 0 x 2 = 0 3x = 10 5 = 5 R = 5

x = 10 x 2 = 100 3x = 30 5 = 5 R = 135

x = 100 x 2 = 10000 3x = 300 5 = 5 R = 10,305

x = 1000 x 2 = 1000000 3x = 3000 5 = 5 R = 1,003,005

x = 10,000 R = 100,030,005

x = 100,000 R = 10,000,300,005


20/36

Some properties of Big Oh notationFastest growing function dominates a sum O(f(n)+ g(n)) is O(max{f( n), g( n)})

Product of upper bounds is upper bound for the product If f is O( g ) and h is O( r )

then f h is O( gr ) f is O( g ) is transitive If f is O( g ) and g is O( h) then f is O( h)

If d is O(f), then ad is O(f) for a>0

If d is O(f) and e is O(g), then d + e is O(f + g)

If f(n)=a 0+ a1n++ adnd the f(n) is O(nd)

log(n x) is O(logn) for any fixed n>0. H ierarchy of functions O(1), O(log n), O(n1/2), O(nlogn), O(n2), O(2n), O(n!)


21/36

An Example of solving Big O

T hus c = 35, and n = 1

0

Method

Other E

xamplesf(n) = 5n + 2 = O(n) // g(n) = n

f(n) e 6n, for n u 3 (C =6, n 0=3)f(n)=n /2 3 = O(n)

f(n) e 0.5 n for n u 0 (C =0.5, n 0=0)


22/36

Classifying Algorithm Based on Big Oh

A function f(n) is said to be of at most logarithmic growth if f(n) =O(log n)

A function f(n) is said to be of at most quadratic growth if f(n) = O(n 2)A function f(n) is said to be of at most polynomial growth if f(n) =

O(n k ), for some natural number k > 1A function f(n) is said to be of at most exponential growth if there is a

constant c, such that f(n) = O(c n), and c > 1A function f(n) is said to be of at most factorial growth if f(n) = O(n!).A function f(n) is said to have constant running time if the size of the

input n has no effect on the running time of the algorithm (e.g.,assignment of a value to a variable). The equation for this algorithm

is f(n) = cOther logarithmic classifications: f(n) = O(n log n)

f(n) = O(log log n)


23/36

G rowth Rates Compared

n=1 n=2 n=4 n=8 n=16 n=32

1 1 1 1 1 1 1

logn 0 1 2 3 4 5

n 1 2 4 8 16 32

nlogn 0 2 8 24 64 16 0

n 2 1 4 16 64 256 1 024

n 3 1 8 64 512 4 09 6 32768

2n 2 4 16 256 65536 42 949672 96


24/36

Big-Omega and Big-Theta notationsB ig-Oh provides an asymptotic way of saying that a function f is less than or equalto another function g i,e f is O(g).f(n) = O(g(n)) if f(n) grows with same rate or slower than g(n).

It represents the upper bound or the worst case of algorithm

If f = ; (g), then f is at least as big as g ( )

In other word f(n) is ; (g(n)) if for a real constant c>0 and an integer n 0>=1 suchthat f(n)>= cg(n) for n>=n 0.

f(n) grows faster or with the same rate as g(n): f(n) = (g(n))It represents the best case or lower bound of the algorithm

If f = 5 (g), f=O (g) and f = ; (g) (or g is both an upper and lower bound. It is atight fit)

In other word for c`>0 and c``>0 and integer n0>=1 such that c`g(n)


25/36

Some W ords of CautionB e careful about very large constant factors while getting an asymptoticnotation.Say for example an algorithm running in time 1,000,000n is still O(n) butmight be less efficient than one running in time 2n 2 which is O(n 2) for sufficientlylarge vale of n. So while we are comparing two algorithms based on big-Oh notationthen we should be careful of constants.

Little-oh and little-Omega

f(n) grows slower than g(n) (or g(n) grows faster than f(n))

if (lim( f(n) / g(n) ) = 0, n Notation: f(n) = o( g(n) ) pronounced "little oh

In other words f(n) is o(g(n) if f(n) becomes in significant in comparisonto(g(n) as n tends to infinity.

f(n) grows faster than g(n) (or g(n) grows slower than f(n))

if (lim( f(n) / g(n) ) = , n

Notation: f(n) = (g(n)) pronounced "little omega"

if g(n) = o( f(n) ) then f(n) = ( g(n) )


26/36

Algorithm prefixAverage1(a):Input: An n element array a of numbersOutput: An n element array of b of numbers such that b[i]=(a[0]+a[1]+ .+a[i]) / (i+1)for i 0 to n-1 do

x 0for j 0 i do

x x+a[i]b[i] x/(i+1)

return array bAnalysis : Running Time is O(n 2)Algorithm prefixAverage2(a):Input: An n element array a of numbersOutput: An n element array of b of numbers such that b[i]=(a[0]+a[1]+ .+a[i]) / (i+1)

s 0for i 0 to n do

s s+a[i]b[i] s/(i+1)return array b

Analysis : Running Time is O(n)

Example of Asymptotic Analysis


27/36

Importance of Asymptotics


28/36

A Quick Mathematical Reviewn

f(i)=f(1)+f(2)++f(n)i=1 ai=(1-an+1)/(1-a) i=n(n+1)/2

Logarithmic identities

F loor and Ceiling functions


29/36

Simple Justification TechniquesTo prove that an algorithm or data structure is correct or faster we need to use amathematical model. W ithout using this mathematical model always we can use somesimple ways to test our algorithm.

By Example

In this we can try to get some example to prove a claim is wrong. If somebody claimsthat all odd number are prime then by example of 9 (3X3) we can prove the theory iswrong. This instance is called as counterexample.

By ContrapositiveP roof by contrapositive takes advantage of the logical equivalence between " P impliesQ" and "Not Q implies Not P ". For example, the assertion "If it is my car, then it is red"is equivalent to "If that car is not red, then it is not mine". So, to prove "If P, Then Q" bythe method of contrapositive means to prove "If Not Q, Then Not P ".

If x and y are two integers for which x + y is even, then x and y have the same parity.P roof. The contrapositive version of this theorem is "If x and y are two integers withopposite parity, then their sum must be odd." So we assume x and y have oppositeparity. Since one of these integers is even and the other odd, there is no loss of generality to suppose x is even and y is odd. Thus, there are integers k and m for which

x = 2k and y = 2m + 1. Now then, we compute the sum x + y = 2k + 2m + 1 = 2(k + m) + 1,which is an odd integer by definition.


30/36

Simple Justification TechniquesIn a proof by contradiction we assume, along with the hypotheses, the logical negationof the result we wish to prove, and then reach some kind of contradiction. That is, if wewant to prove "If P , Then Q", we assume P and Not Q. The contradiction we arrive at

could be some conclusion contradicting one of our assumptions, or somethingobviously untrue like 1 = 0.There are no positive integer solutions to the equation x 2 - y2 = 1.P roof. (P roof by C ontradiction.) Assume to the contrary that there is a solution (x, y) where xand y are positive integers. If this is the case, we can factor the left side: x 2 - y2 = (x-y)(x + y) = 1.Since x and y are integers, it follows that either x-y = 1 and x + y = 1 or x-y = -1 and x + y = -1. Inthe first case we can add the two equations to get x = 1 and y = 0, contradicting our assumptionthat x and y are positive. The second case is similar, getting x = -1 and y = 0, again contradictingour assumption.

The difference between the C ontrapositive method and the C ontradiction method is

subtle. Let's examine how the two methods work when trying to prove "If P , Then Q". M ethod of C ontradiction: Assume P and Not Q and prove some sort of contradiction. M ethod of C ontrapositive: Assume Not Q and prove Not P .

The method of C ontrapositive has the advantage that your goal is clear: P rove Not P . Inthe method of C ontradiction, your goal is to prove a contradiction, but it is not alwaysclear what the contradiction is going to be at the start.


31/36

Simple Justification TechniquesInductionIf q(n) is true for an integer n , if we can prove that q(n + 1) is true then wecan assume that q(n) is true for all positive integers.

For any positive integer n, 1 + 2 + ... + n = n(n + 1) /2.P roof. (P roof by M athematical Induction) Let's let P (n) be the statement "1 + 2 + ... + n= (n (n + 1) /2." (The idea is that P (n) should be an assertion that for any n is verifiablyeither true or false.) The proof will now proceed in two steps: the initial step and the

inductive step .Initial Step. W e must verify that P (1) is True. P (1) asserts "1 = 1(2) /2", which is clearlytrue. So we are done with the initial step.Inductive Step. H ere we must prove the following assertion: "If there is a k such thatP (k) is true, then (for this same k) P (k + 1) is true." Thus, we assume there is a k suchthat 1 + 2 + ... + k = k (k + 1) /2. (W e call this the inductive assumption .) W e must

prove, for this same k, the formula 1 + 2 + ... + k + (k + 1) = (k + 1)(k + 2) /2.This is not too hard: 1 + 2 + ... + k + (k + 1) = k(k + 1) /2 + (k + 1) = (k(k + 1) + 2 (k + 1)) /2 =(k + 1)(k + 2) /2. The first equality is a consequence of the inductive assumption.


32/36

Loop Invariant F or


33/36

Loop Invariant


34/36

Loop Invariant


35/36

Basic ProbabilityIndependence: Two events A and B are independent if P (A B )=P (A). P (B )

C onditional P robability : The conditional probability of an event B occurs given anevent A is denoted by P (B /A), which is defined as

The expected value of a random variable is the weighted average of all possible valuesthat this random variable can take on. The weights used in computing this average

correspond to the probabilities in case of a discrete random variable, or densities in caseof a continuous random variable.

If X and Y be two arbitrary random variables thenE (X+ Y)= E (X) +E (Y)E (XY)= E (X). E (Y)

Suppose random variable X can take value x1 with probability p1, value x2 withprobability p2, and so on, and, lastly, it can take value xk with probability pk . Thenthe expectation of this random variable X is defined as

E (X)= x1. p1+ x2. p2 + .+ xk. pk


36/36

Documents

Data Structre and Algorithm-1