Data Structre and Algorithm-1

Embed Size (px)

Citation preview

  • 8/7/2019 Data Structre and Algorithm-1

    1/36

    Data Structure and Algorithm

    Algorithm Outline, the essence of a computationalprocedure, Step-by-step instruction

    P rogram Ans implementation of an algorithmin some programming languageData Structure Organization of data needed tosolve the problem.

  • 8/7/2019 Data Structre and Algorithm-1

    2/36

    W hat is good algorithm

    E fficiencyRunning TimeSpace used

    E fficiency as a function of input sizeThe number of bits as inputThe number of data elements

  • 8/7/2019 Data Structre and Algorithm-1

    3/36

    Analysis of Algorithms

    R unning time of algorithm depends typically on

    Size of input Data structure Hardware environment (clock speed, processor, memory, disk speed)

    Software environment (OS, language, interpreted or compiled)

    If hardware and software environment remain same, running time depends on the sizeof inputs and data structure.

    W hile experimenting an algorithm with test inputTest inputs must be representativeTest must be done on same H/ w and S /w environment.

  • 8/7/2019 Data Structre and Algorithm-1

    4/36

    Analysis of AlgorithmsW e need an analytical framework to analyze algorithms even without actualexperimentation which can provide us

    Comparison of efficiency of two algorithms Can consider all possible inputs Can be studied without actually implementing or running the algorithm

    T here are various components while developing an analysis methodologies such as

    A standard language for expressing the algorithmA computational model for algorithmA metric for measuring the running time of algorithm

    Expressing running time in terms of input size including recursive algorithms

  • 8/7/2019 Data Structre and Algorithm-1

    5/36

    Limitations of Experimental Studies

    It is necessary to implement and test the algorithm in order todetermine ints running time.

    Experiments can be done only on some limited sets of inputs whileactually an infinite set of inputs can be created. So running time maynot be indicative as we are leaving lot of inputs out of experimentation.In order to experiment same hardware and software shold be used.

  • 8/7/2019 Data Structre and Algorithm-1

    6/36

    W ithout Actual ExperimentationW e should develop a general methodology for analyzingrunning time of algorithms. In this approach we will use

    H igh-level description of algorithm instead of testing few of itsimplementation

    Takes into account all possible inputs.

    Allows one to evaluate the efficiency of algorithm in a way that is

    independent of the hardware and software environment.

  • 8/7/2019 Data Structre and Algorithm-1

    7/36

    Language for expressing the algorithms Pseudo CodeP seudo-code

    A way of expressing algorithms that uses a mixture of E nglish phrases andindention to make the steps in the solution explicit

    Pseudo-code mixes standard programming language with natural language.There are no grammar rules in pseudo-code

    Pseudo-code is not case sensitive

    Following is the algorithm of insertion sort on a zero-based array

    1. for j 1 to length(A)-12. key A[ j ]3. > A[ j ] is added in the sorted sequence A[1, .. j-1]4. i j - 15. while i >= 0 and A [ i ] > key6. A[ i +1 ] A[ i ]7. i i -1

    8. A [i +1] key

  • 8/7/2019 Data Structre and Algorithm-1

    8/36

    Pseudo Code ConstructsP rogramming language constructs we use in pseudo-code are taken from high levellanguages from C ,C++ . These are as follows

    Expression: is used for assignment and = is used for comparison.

    M ethod declaration : callx(param1,param2) is a method named callx with twoinput parameter param1 and param2.

    P rogramming constructs Decision structure: If then else . Indentation is

    used to show the scope.W hile-loops: W hile do . H ere also indentation is used for scope.For loop: for do. Indentation is used to display the scope of for.

    Repeat loop: Repeat < action> until Array indexing: A[i] represents the i-th cell in the array A.

    M ethods M ethod call: object.method(). W here method belongs to object. M ethod returns: The return value of method.

  • 8/7/2019 Data Structre and Algorithm-1

    9/36

    Random Access Machine ( RAM) Model

    A R andom Access Machine (RA M ) is a theoretical computer model which has anunlimited number of registers of unlimited size which can be accessed randomly.Then, there is a program which is run step-by-step and consists of a limited number of instructions out of a simple instruction set. The set includes very basic arithmeticoperations, jump instructions and direct and indirect access to the registers.A RA M model can perform any primitive operation in a constant number of stepswhich is independent of size of input.

  • 8/7/2019 Data Structre and Algorithm-1

    10/36

    Primitive Operations

    To have an idea of running time of a pseudo-code without actually implementing it,we define a set of high level primitive operations that can be identified in pseudo-code and independent of implementation. W e generally consider the followingprimitive operation to estimate running time. In estimation we count the number of primitive operation of a pseudo code. From this count we can compare two pseudo-code. The primitive operations are

    Value assignment to a variable. Calling a method Doing an arithmetic operation Comparison of two numbers

    Array indexingObject referenceReturning from a method

  • 8/7/2019 Data Structre and Algorithm-1

    11/36

    Analysis of Algorithm by counting primitive operations

    B y inspecting the pseudo-code , we can count the number of primitive

    operations required by an algorithm.

    And thus we can have an idea of the running time of the code.

    P rimitive Operations

    for j 2 to length(A) n-1

    key A[ j ] n-1

    # A[ j ] is added in the sorted sequence A[1, .. j-1]

    i j - 1 n-1

    while i >= 0 and A [ i ] > key Best :(n-1)*2 Worst:(n-1)(n+2)/2

    A[ i +1 ] A[ i ] 0 (n)(n-1)/2i i -1 0 (n)(n-1)/2

    A [i +1] key n-1

  • 8/7/2019 Data Structre and Algorithm-1

    12/36

    C ost times

    for j 2 to n c1 n

    key A[ j ] c2 n-1

    # A[ j ] is added in the sorted sequence A[1, .. j-1]

    i j - 1 c3 n-1

    while i >= 0 and A [ i ] > key c4 t k

    A[ i +1 ] A[ i ] c5 t k - 1

    i i -1 c6 t k - 1

    A [i +1] key c7 n-1

    Best-case, Average-case and worst-case analysis

    L et us do a cost analysis of Insertion sort.

    n

    Total Time = n(c1 + c2+ c3+ c7)+ tk (c4 + c5+ c6) (c2 + c3+ c5+ c6+ c7)

    k=2

  • 8/7/2019 Data Structre and Algorithm-1

    13/36

    Best-case, Average-case and worst-case analysis

    B est C ase for insertion sort : W hen numbers are already sorted.tk =1. Running time = f(n)W orst C ase : W hen numbers are inversely sorted. t k =k. Runningtime = f(n 2)Average C ase: t k =k /2. Running time = f(n 2).

    Time vs Input Graph

  • 8/7/2019 Data Structre and Algorithm-1

    14/36

    Best-case, Average-case and worst-case analysisB est, worst and average cases of a given algorithm express whatthe resource usage is at least , at most and on average , respectively.

    Usually the resource being considered is running time, but it couldalso be memory or other resources.In real-time computing, the worst-case execution time is often of particular concern since it is important to know how much timemight be needed in the worst case to guarantee that the algorithmwill always finish on time.Average performance and worst-case performance are the mostused in algorithm analysis. Less widely found is best-caseperformance. P robabilistic analysis techniques, especially expected

    value, to determine average case analysis.Average case is often as bad as the worst case.Finding average case can be very difficult.

  • 8/7/2019 Data Structre and Algorithm-1

    15/36

    Analyzing Recursive Algorithm

  • 8/7/2019 Data Structre and Algorithm-1

    16/36

    Analyzing Recursive Algorithm

    T(n) denotes the running time time of an algorithm of input size n

  • 8/7/2019 Data Structre and Algorithm-1

    17/36

    Asymptotic Analysis Goal :To simplify analysis of running time be getting rid of details

    which may be affected by specific implementation and hardware.

    Capturing the essence : H ow the running time of an algorithm withthe size of input in the limit.

    )T( n) = O ( n 3 ly,Notational

    "3n of order the on "roughly grows time running The

    )(log3

    nT nnn2nnn

    dominatesitso, and , , than larger MUCH is larger, grows As

    nnnnnnT 4log24213)( 23!Example:

    Ignoring constants in T (n)Analyzing T (n) as n "gets large"

  • 8/7/2019 Data Structre and Algorithm-1

    18/36

    Asymptotic NotationDefinition (Big Oh) C onsider a function f (n) which is non-negativefor all integers n>0. W e say that `` f (n) is big oh g (n),'' which we write

    f (n)=O(g (n)), if there exists an integer n 0 and a constant c> 0 such thatfor all integers , n>=n 0 , f(n)

  • 8/7/2019 Data Structre and Algorithm-1

    19/36

    Asymptotic Notation

    B ig O notation is a huge simplification; can we justify it?

    It only makes sense for large problem sizes For sufficiently large problem sizes, the highest-order termswamps all the rest!

    C onsider R = x 2 + 3x + 5 as x varies:x = 0 x 2 = 0 3x = 10 5 = 5 R = 5

    x = 10 x 2 = 100 3x = 30 5 = 5 R = 135

    x = 100 x 2 = 10000 3x = 300 5 = 5 R = 10,305

    x = 1000 x 2 = 1000000 3x = 3000 5 = 5 R = 1,003,005

    x = 10,000 R = 100,030,005

    x = 100,000 R = 10,000,300,005

  • 8/7/2019 Data Structre and Algorithm-1

    20/36

    Some properties of Big Oh notationFastest growing function dominates a sum O(f(n)+ g(n)) is O(max{f( n), g( n)})

    Product of upper bounds is upper bound for the product If f is O( g ) and h is O( r )

    then f h is O( gr ) f is O( g ) is transitive If f is O( g ) and g is O( h) then f is O( h)

    If d is O(f), then ad is O(f) for a>0

    If d is O(f) and e is O(g), then d + e is O(f + g)

    If f(n)=a 0+ a1n++ adnd the f(n) is O(nd)

    log(n x) is O(logn) for any fixed n>0. H ierarchy of functions O(1), O(log n), O(n1/2), O(nlogn), O(n2), O(2n), O(n!)

  • 8/7/2019 Data Structre and Algorithm-1

    21/36

    An Example of solving Big O

    T hus c = 35, and n = 1

    0

    Method

    Other E

    xamplesf(n) = 5n + 2 = O(n) // g(n) = n

    f(n) e 6n, for n u 3 (C =6, n 0=3)f(n)=n /2 3 = O(n)

    f(n) e 0.5 n for n u 0 (C =0.5, n 0=0)

  • 8/7/2019 Data Structre and Algorithm-1

    22/36

    Classifying Algorithm Based on Big Oh

    A function f(n) is said to be of at most logarithmic growth if f(n) =O(log n)

    A function f(n) is said to be of at most quadratic growth if f(n) = O(n 2)A function f(n) is said to be of at most polynomial growth if f(n) =

    O(n k ), for some natural number k > 1A function f(n) is said to be of at most exponential growth if there is a

    constant c, such that f(n) = O(c n), and c > 1A function f(n) is said to be of at most factorial growth if f(n) = O(n!).A function f(n) is said to have constant running time if the size of the

    input n has no effect on the running time of the algorithm (e.g.,assignment of a value to a variable). The equation for this algorithm

    is f(n) = cOther logarithmic classifications: f(n) = O(n log n)

    f(n) = O(log log n)

  • 8/7/2019 Data Structre and Algorithm-1

    23/36

    G rowth Rates Compared

    n=1 n=2 n=4 n=8 n=16 n=32

    1 1 1 1 1 1 1

    logn 0 1 2 3 4 5

    n 1 2 4 8 16 32

    nlogn 0 2 8 24 64 16 0

    n 2 1 4 16 64 256 1 024

    n 3 1 8 64 512 4 09 6 32768

    2n 2 4 16 256 65536 42 949672 96

  • 8/7/2019 Data Structre and Algorithm-1

    24/36

    Big-Omega and Big-Theta notationsB ig-Oh provides an asymptotic way of saying that a function f is less than or equalto another function g i,e f is O(g).f(n) = O(g(n)) if f(n) grows with same rate or slower than g(n).

    It represents the upper bound or the worst case of algorithm

    If f = ; (g), then f is at least as big as g ( )

    In other word f(n) is ; (g(n)) if for a real constant c>0 and an integer n 0>=1 suchthat f(n)>= cg(n) for n>=n 0.

    f(n) grows faster or with the same rate as g(n): f(n) = (g(n))It represents the best case or lower bound of the algorithm

    If f = 5 (g), f=O (g) and f = ; (g) (or g is both an upper and lower bound. It is atight fit)

    In other word for c`>0 and c``>0 and integer n0>=1 such that c`g(n)

  • 8/7/2019 Data Structre and Algorithm-1

    25/36

    Some W ords of CautionB e careful about very large constant factors while getting an asymptoticnotation.Say for example an algorithm running in time 1,000,000n is still O(n) butmight be less efficient than one running in time 2n 2 which is O(n 2) for sufficientlylarge vale of n. So while we are comparing two algorithms based on big-Oh notationthen we should be careful of constants.

    Little-oh and little-Omega

    f(n) grows slower than g(n) (or g(n) grows faster than f(n))

    if (lim( f(n) / g(n) ) = 0, n Notation: f(n) = o( g(n) ) pronounced "little oh

    In other words f(n) is o(g(n) if f(n) becomes in significant in comparisonto(g(n) as n tends to infinity.

    f(n) grows faster than g(n) (or g(n) grows slower than f(n))

    if (lim( f(n) / g(n) ) = , n

    Notation: f(n) = (g(n)) pronounced "little omega"

    if g(n) = o( f(n) ) then f(n) = ( g(n) )

  • 8/7/2019 Data Structre and Algorithm-1

    26/36

    Algorithm prefixAverage1(a):Input: An n element array a of numbersOutput: An n element array of b of numbers such that b[i]=(a[0]+a[1]+ .+a[i]) / (i+1)for i 0 to n-1 do

    x 0for j 0 i do

    x x+a[i]b[i] x/(i+1)

    return array bAnalysis : Running Time is O(n 2)Algorithm prefixAverage2(a):Input: An n element array a of numbersOutput: An n element array of b of numbers such that b[i]=(a[0]+a[1]+ .+a[i]) / (i+1)

    s 0for i 0 to n do

    s s+a[i]b[i] s/(i+1)return array b

    Analysis : Running Time is O(n)

    Example of Asymptotic Analysis

  • 8/7/2019 Data Structre and Algorithm-1

    27/36

    Importance of Asymptotics

  • 8/7/2019 Data Structre and Algorithm-1

    28/36

    A Quick Mathematical Reviewn

    f(i)=f(1)+f(2)++f(n)i=1 ai=(1-an+1)/(1-a) i=n(n+1)/2

    Logarithmic identities

    F loor and Ceiling functions

  • 8/7/2019 Data Structre and Algorithm-1

    29/36

    Simple Justification TechniquesTo prove that an algorithm or data structure is correct or faster we need to use amathematical model. W ithout using this mathematical model always we can use somesimple ways to test our algorithm.

    By Example

    In this we can try to get some example to prove a claim is wrong. If somebody claimsthat all odd number are prime then by example of 9 (3X3) we can prove the theory iswrong. This instance is called as counterexample.

    By ContrapositiveP roof by contrapositive takes advantage of the logical equivalence between " P impliesQ" and "Not Q implies Not P ". For example, the assertion "If it is my car, then it is red"is equivalent to "If that car is not red, then it is not mine". So, to prove "If P, Then Q" bythe method of contrapositive means to prove "If Not Q, Then Not P ".

    If x and y are two integers for which x + y is even, then x and y have the same parity.P roof. The contrapositive version of this theorem is "If x and y are two integers withopposite parity, then their sum must be odd." So we assume x and y have oppositeparity. Since one of these integers is even and the other odd, there is no loss of generality to suppose x is even and y is odd. Thus, there are integers k and m for which

    x = 2k and y = 2m + 1. Now then, we compute the sum x + y = 2k + 2m + 1 = 2(k + m) + 1,which is an odd integer by definition.

  • 8/7/2019 Data Structre and Algorithm-1

    30/36

    Simple Justification TechniquesIn a proof by contradiction we assume, along with the hypotheses, the logical negationof the result we wish to prove, and then reach some kind of contradiction. That is, if wewant to prove "If P , Then Q", we assume P and Not Q. The contradiction we arrive at

    could be some conclusion contradicting one of our assumptions, or somethingobviously untrue like 1 = 0.There are no positive integer solutions to the equation x 2 - y2 = 1.P roof. (P roof by C ontradiction.) Assume to the contrary that there is a solution (x, y) where xand y are positive integers. If this is the case, we can factor the left side: x 2 - y2 = (x-y)(x + y) = 1.Since x and y are integers, it follows that either x-y = 1 and x + y = 1 or x-y = -1 and x + y = -1. Inthe first case we can add the two equations to get x = 1 and y = 0, contradicting our assumptionthat x and y are positive. The second case is similar, getting x = -1 and y = 0, again contradictingour assumption.

    The difference between the C ontrapositive method and the C ontradiction method is

    subtle. Let's examine how the two methods work when trying to prove "If P , Then Q". M ethod of C ontradiction: Assume P and Not Q and prove some sort of contradiction. M ethod of C ontrapositive: Assume Not Q and prove Not P .

    The method of C ontrapositive has the advantage that your goal is clear: P rove Not P . Inthe method of C ontradiction, your goal is to prove a contradiction, but it is not alwaysclear what the contradiction is going to be at the start.

  • 8/7/2019 Data Structre and Algorithm-1

    31/36

    Simple Justification TechniquesInductionIf q(n) is true for an integer n , if we can prove that q(n + 1) is true then wecan assume that q(n) is true for all positive integers.

    For any positive integer n, 1 + 2 + ... + n = n(n + 1) /2.P roof. (P roof by M athematical Induction) Let's let P (n) be the statement "1 + 2 + ... + n= (n (n + 1) /2." (The idea is that P (n) should be an assertion that for any n is verifiablyeither true or false.) The proof will now proceed in two steps: the initial step and the

    inductive step .Initial Step. W e must verify that P (1) is True. P (1) asserts "1 = 1(2) /2", which is clearlytrue. So we are done with the initial step.Inductive Step. H ere we must prove the following assertion: "If there is a k such thatP (k) is true, then (for this same k) P (k + 1) is true." Thus, we assume there is a k suchthat 1 + 2 + ... + k = k (k + 1) /2. (W e call this the inductive assumption .) W e must

    prove, for this same k, the formula 1 + 2 + ... + k + (k + 1) = (k + 1)(k + 2) /2.This is not too hard: 1 + 2 + ... + k + (k + 1) = k(k + 1) /2 + (k + 1) = (k(k + 1) + 2 (k + 1)) /2 =(k + 1)(k + 2) /2. The first equality is a consequence of the inductive assumption.

  • 8/7/2019 Data Structre and Algorithm-1

    32/36

    Loop Invariant F or

  • 8/7/2019 Data Structre and Algorithm-1

    33/36

    Loop Invariant

  • 8/7/2019 Data Structre and Algorithm-1

    34/36

    Loop Invariant

  • 8/7/2019 Data Structre and Algorithm-1

    35/36

    Basic ProbabilityIndependence: Two events A and B are independent if P (A B )=P (A). P (B )

    C onditional P robability : The conditional probability of an event B occurs given anevent A is denoted by P (B /A), which is defined as

    The expected value of a random variable is the weighted average of all possible valuesthat this random variable can take on. The weights used in computing this average

    correspond to the probabilities in case of a discrete random variable, or densities in caseof a continuous random variable.

    If X and Y be two arbitrary random variables thenE (X+ Y)= E (X) +E (Y)E (XY)= E (X). E (Y)

    Suppose random variable X can take value x1 with probability p1, value x2 withprobability p2, and so on, and, lastly, it can take value xk with probability pk . Thenthe expectation of this random variable X is defined as

    E (X)= x1. p1+ x2. p2 + .+ xk. pk

  • 8/7/2019 Data Structre and Algorithm-1

    36/36