35
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

  • View
    222

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

1

Foundations of Software DesignFall 2002Marti Hearst

Lecture 10: Math Review, Intro to Analysis of Algorithms

 

 

Page 2: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

2

Today

• Math review:– Exponents and logarithms– Functions and graphs

• Intro to Analysis of Algorithms

Page 3: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

3

Functions, Graphs of Functions

• Function: a rule that– Coverts inputs to outputs in a well-defined way.– This conversion process is often called mapping.– Input space called the domain– Output space called the range

Page 4: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

4

Functions, Graphs of Functions

• Function: a rule that– Coverts inputs to outputs in a well-defined way.– This conversion process is often called mapping.– Input space called the domain– Output space called the range

• Examples– Mapping of speed of bicycling to calories burned

• Domain: values for speed • Range: values for calories

– Mapping of people to names• Domain: people• Range: Strings

Page 5: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

5

Example: How many calroies does Bicycling burn?

Miles/Hour vs. KiloCalories/Minute

For a 150 lb rider.Green: riding on a gravel road with a mountain bike Blue: paved road riding a touring bicycle Red: racing bicyclist.

From Whitt, F.R. & D. G. Wilson. 1982. Bicycling Science (second edition).http://www.frontier.iarc.uaf.edu/~cswingle/misc/exercise.phtml

Page 6: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

6

Functions and Graphs of Functions

• Notation– Many different kinds– f(x) is read “f of x”– This means a function called f takes x as an

argument– f(x) = y– This means the function f takes x as input and

produces y as output.– The rule for the mapping is hidden by the notation.

Page 7: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

7

Here f(x) = 7x

A point on this graph can be called (x,y) or (x, f(x))

http://www.sosmath.com/algebra/logs/log1/log1.html

Page 8: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

8

We also say y=f(x)

A point on this graph can be called (x,y) or (x, f(x))

A straight line is defined by the function y = ax + b a and b are constantsx is variable

http://www.sosmath.com/algebra/logs/log1/log1.html

Page 9: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

9

Exponents and Logarithms

• Exponents: shorthand for multiplication• Logarithms: shorthand for exponents• How we use these?

– Difficult computational problems grow exponentially– Logarithms are useful for “squashing” them

Page 10: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

10

The exponential function f with base a is denoted by and x is any real number.Note how much more “quickly the graph grows” than the linear graph of f(x) = x

Example: If the base is 2 and x = 4, the function value f(4) will equal 16. A corresponding point on the graph would be (4, 16).

http://www.sosmath.com/algebra/logs/log1/log1.html

xaxf )(

Page 11: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

11http://www.sosmath.com/algebra/logs/log1/log1.html

xaxxf xfa )( ifonly and if )(log)(

xxxf xf )(2 2 ifonly and if )(log)(

1,0,0For aax

16)2(log

216

2

16

225

1log

25

1

5*5

15

5

2

Page 12: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

12

Logarithmic functions are the inverse of exponential functions.If (4, 16) is a point on the graph of an exponential function, then (16, 4) would be the corresponding point on the graph of the inverse logarithmic function.

http://www.sosmath.com/algebra/logs/log1/log1.html

Page 13: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

13Illustration by Jacob Nielsen

Zipf Distribution(linear and log scale)

Page 14: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

14

Rank Freq1 37 system2 32 knowledg3 24 base4 20 problem5 18 abstract6 15 model7 15 languag8 15 implem9 13 reason10 13 inform11 11 expert12 11 analysi13 10 rule14 10 program15 10 oper16 10 evalu17 10 comput18 10 case19 9 gener20 9 form

Zipf Curves for Term Frequency

Page 15: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

15

Zoom in on the Knee of the Curve

43 6 approach44 5 work45 5 variabl46 5 theori47 5 specif48 5 softwar49 5 requir50 5 potenti51 5 method52 5 mean53 5 inher54 5 data55 5 commit56 5 applic57 4 tool58 4 technolog59 4 techniqu

Page 16: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

16

Other Functions

• Quadratic function:• This is a graph of:

http://www.sosmath.com/algebra/

4)(

)(2

2

xxf

cbxaxxf

(-2,0) (2,0)

Page 17: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

17

Other Functions

• This one takes analysis to figure out• Graph of: f(x)= -0.3(x+2)x(x-1)

http://www.sosmath.com/algebra/

Page 18: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

18

Function Pecking Order

• In increasing order log(n) n n^2 n^5 2^n1 2 4 32 42 4 16 1024 163 8 64 32768 2564 16 256 1048576 655365 32 1024 33554432 4.29E+096 64 4096 1.07E+09 1.84E+197 128 16384 3.44E+10 3.4E+388 256 65536 1.1E+12 1.16E+779 512 262144 3.52E+13 1.3E+154

10 1024 1048576 1.13E+15 #NUM!

Adapted from Goodrich & Tamassia

Page 19: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

19

Summation Notation

n

i

nnni

1 2

)1(...321

Page 20: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

20

Summation Notation

1

1 2

))(1(1...321

n

i

nnni

n

i

nnni

1 2

)1(...321

Page 21: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

21

Summation Notation and Java

1

0

22 )1(...9410n

i

ni

Page 22: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

22

Iterative form vs. Closed Form

nn

i

i aaaaa

...1 32

0

n

i

ni

a

aa

0

1

1

1

Page 23: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

23

Why Analysis of Algorithms?

• To find out– How long an algorithm takes to run – How to compare different algorithms – This is done at a very abstract level– This can be done before code is written

• Alternative: Performance analysis– Actually time each operation as the program is

running– Specific to the machine and the implementation of

the algorithm– Specific, not abstract– Can only be done after code is written

Page 24: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

24

Counting Primitive Operations

Algorithm ArrayMax(A,n)Input: An array A storing N integersOutput: The maximum element in A.

currentMax A[0]for i 1 to n-1 do

if currentMax < A[i] then currentMax A[i]

return currentMax

2 steps + 1 to initialize i

2 steps

2 steps

1 step

2 step each time (compare i to n, inc i)n-1 times

How often done??

It depends on the order the numbers appear in in A[]

Between 4(n-1) and 6(n-1) in the loop

Adapted from Goodrich & Tamassia

Page 25: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

25

Algorithm Complexity

• Worst Case Complexity:– the function defined by the maximum number of

steps taken on any instance of size n

• Best Case Complexity:– the function defined by the minimum number of

steps taken on any instance of size n

• Average Case Complexity:– the function defined by the average number of steps

taken on any instance of size n

Adapted from http://www.cs.sunysb.edu/~algorith/lectures-good/node1.html

Page 26: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

26

Best, Worst, and Average Case Complexity

Adapted from http://www.cs.sunysb.edu/~algorith/lectures-good/node1.html

Worst Case Complexity

Average Case Complexity

Best Case Complexity

Number of steps

N (input size)

Page 27: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

27

Doing the Analysis• It’s hard to estimate the running time exactly

– Best case depends on the input– Average case is difficult to compute– So we usually focus on worst case analysis

• Easier to compute• Usually close to the actual running time

• Strategy: try to find upper and lower bounds of the worst case function.

Adapted from http://www.cs.sunysb.edu/~algorith/lectures-good/node2.html

Upper bound

Lower bound

Actual function

Page 28: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

28

Names of Bounding Functions• f(n) is (g(n)) means c*g(n) is an upper bound

on f(n)

• f(n) is (g(n)) means c*g(n) is a lower bound on f(n)

• f(n) is (g(n)) means c1*g(n) is an upper bound on f(n) and c2*g(n) is a lower bound on f(n)

• If f(n) is (g(n)) and f(n) is (g(n)) then f(n) is (g(n))

• Here c, c1 and c2 are constants.

Adapted from http://www.cs.sunysb.edu/~algorith/lectures-good/node2.html

Page 29: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

29

Constants versus n

• We usually focus on big-oh.• It is important to understand the difference between

constants and n– A constant has a fixed value, doesn’t change

• Doesn’t really matter what the value of the constant is.

– n reflects the size of the problem• So n can get really really big

– This is why we talk about the time in terms of a function of n

• In the ArrayMax example, – We don’t really need to pay attention to 4(n-1) versus 6(n-1)– They are both order n

Adapted from http://www.cs.sunysb.edu/~algorith/lectures-good/node2.html

Page 30: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

30

Problems that have large n

• Put a list of all contributors to the 2000 presidential campaigns into alphabetical order.

• Run a photoshop-style filter across all the pixels of a high-resolution image.

• Others?

Page 31: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

31

Constraining Large Problems

• Number of ways there are to fly roundtrip from the Bay Area to Washington DC.

– Here n is the number of available flights on a given day throughout the country

– But have to add lots of constraints too• Choose SFO, OAK, or SJ• Choose BWI, Dulles, or National• Choose airline• Direct, one stop, two stops?

– Connect through Dallas or Denver or Chicago or LAX or … • How long must be allowed for layovers?• Which combos are cheapest?• What about open-jaw?

– If you try all possible combinations, it will take a very long time to run!!

Page 32: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

32

The Crossover Point

Adapted from http://www.cs.sunysb.edu/~algorith/lectures-good/node2.html

One function starts out faster for small values of n.But for n > n0, the other function is always faster.

Page 33: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

33

More formally• Let f(n) and g(n) be functions mapping

nonnegative integers to real numbers.

• f(n) is (g(n)) if there exist positive constants n0 and c such that for all n>=n0, f(n) <= c*g(n)

• Other ways to say this:f(n) is order g(n)

f(n) is big-Oh of g(n) f(n) is Oh of g(n)

Page 34: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

34

Plot them!

0

2E+153

4E+153

6E+153

8E+153

1E+154

1.2E+154

1.4E+154

1.6E+154

1 2 3 4 5 6 7 8 9 10

log(n)

n

n 2̂

n 5̂

2 n̂

11E+121E+24

1E+361E+481E+601E+72

1E+841E+96

1E+1081E+120

1E+1321E+1441E+156

1 2 3 4 5 6 7 8 9 10

log(n)

n

n 2̂

n 5̂

2 n̂

Both x and y linear scales Convert y axis to log scale

(that jump for large n happens because the last number is out of range)

Notice how much bigger 2^n is than n^k

This is why exponential growth is BAD BAD BAD!!

Page 35: 1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 10: Math Review, Intro to Analysis of Algorithms

35

Summary: Analysis of Algorithms

• A method for determining, in an abstract way, the asymptotic running time of an algorithm– Here asymptotic means as n gets very large

• Useful for comparing algorithms• Useful also for determing tractability

– Meaning, a way to determine if the problem is intractable (impossible) or not

– Exponential time algorithms are usually intractable.

• We’ll revisit these ideas throughout the rest of the course.