1 Algorithms Starring: Binary Search Co Starring: Big-O

1

Algorithms

Starring: Binary SearchCo Starring: Big-O

2

Purpose:

The ability to effectively process a large volume of data is a critical element in systems design.

If we had to maintain information on 10 licensed drivers, we could code it almost any way we wished.

3

Purpose:

However, if that list grew from 10 to 10 million, the WAY we store, order & retrieve this data would become critically important.

4

Resources:

Java Essentials Chapter 18 p.703

Java Essentials Study Guide Chapter 15 p.235

5

Intro:

Knowing all of the rules of English, grammar and spelling, will not help you give directions from place a to place b if you do not know how to get there.

In systems, an analyst can describe a method in more abstract terms to a programmer without knowing the exact syntax of the programming language.

Programs are typically based on one or more algorithms.

6

An algorithm is a abstract and formal step by step recipe that tells how to perform a certain task or solve a certain problem on a computer.

Pseudocode is a solution in a loosely formatted style of the actual software, Java, code but without the syntax. This is the shorthand that developers use to flesh out a solution.

7

When dealing with handling large volumes of data, it makes sense to form an acceptable algorithm that will effectively work with the data. Before you actually implement this algorithm, you need to scope it out and analyze its potential efficiency.

8

Therefore, an algorithm that efficiently orders (sorts) a large volume of data and another algorithm that efficiently searches for a specific element in that data, a specific driver’s information obtained by using their SSN, is imperative.

9

We will cover the following topics:

A Gentle Introduction to Big-O Sequential Search Algorithm A Need for Order Bubble Sort (a Review) Selection Sort Binary Search

10

A Gentle Introduction to Big-O:

When we begin to discuss algorithms we MUST be able to evaluate their effectiveness in some way

One way would be to evaluate their execution or pure clock time

This method leaves a tremendous dependency on the power of a specific CPU

11

Also, if the algorithm is inefficient, a powerful CPU can mask the problem only up to a point

We need a more abstract, standard, mechanism for evaluating efficiency

12

We use a more logical Order of Growth methodology, Big-O, to evaluate theoretical efficiency

This method obviates the relative strengths of the system(s) on which a given algorithm executes

The Big-O Growth Rate can be summed up with the following chart:

13

O(1) < O(log n) < O(n) < O(n log n) < O(n^2) < O(n^3) < O(a^n)

Linear

t

n

Exponential

Quadratic N log n

Constant O(1)

14

As you can see, a constant growth rate is optimum whereas an exponential rate is a problem

What do you think the “N” stands for ?

15

Here is a little comparison chart that illustrates the concept:

N N^2 N Log(N)

100 10,000 664

300 90,000 2,468

1,000 1,000,000 9,965100,000 10,000,000,000 1,660,964

16

We will examine these in depth in our lecture on Big-O

For now, understand that an algorithm that has a Logarithmic efficiency is preferable to a Quadratic algorithm

17

Sequential Search Algorithm:

Given an example where we have a “database” consisting of only 10 Licensed drivers

Well, we can create “driver” class and then an array of instances of that class

Order really does not matter since we have only 10 drivers to search

18

Adding drivers to the array would be efficient as it only takes one “step”:

myDriverArray[2] = new myDriver(constructor info);

What do you think the Order of Efficiency would be for the “add” ?

19

Adding drivers to the array would be efficient as it only takes one “step”:

myDriverArray[2] = new myDriver(constructor info);

What do you think the Order of Efficiency would be for the “add” ?

ANS: Constant O(1)

20

If we needed to look for a specific driver using their SSN, at most how many “steps” would we need to execute ?

at Least ?

21

If we needed to look for a specific driver using their SSN, at most how many “steps” would we need to execute ?

at Least ?

ANS: 10 if driver was last item or not in array

Best case is 1 step

22

This is the essence of a Sequential Search, it iterates over each element in a list and stops either when the item is located or the end of the list is reached

What do you think the Order of Efficiency is in the best and worst cases ?

23

This is the essence of a Sequential Search, it iterates over each element in a list and stops either when the item is located or the end of the list is reached

What do you think the Order of Efficiency is in the best and worst cases ?

ANS: if the driver being searched is the first in the list, then it is Constant O(1) otherwise it is Linear O(N)

24

This search is also known as a Linear Search

How is this coded ?

25

This search is also known as a Linear Search

How is this coded ?Driver myDriver[] = new Driver[100];String SSN = new String(“123456789”);for (int i = 0; i < myDriver.length; i++){

if myDriver[i].getSSN.equals(SSN)return i;

}return -1;

26

A Need for Order:Well, our little search works for 10 Drivers,

but if our list had 1 million drivers, then we can expect our linear search algorithm to execute 1 million times EACH time we look for a specific Driver

We need a better way to search our list, but before we can think of a more efficient search we need to order the data in a way that can be used in a more advanced search

27

We need to make sure that our list is indexed in a manner such that the sequence of SSN’s is ordered from “smallest” to “greatest”

Now, it is important to note that just as there is a “cost” to performing a search against a list there is a “cost” for sorting a list

28

Therefore, we need to evaluate the relative value of sorting a list so that we may execute an efficient search AGAINST simply leaving the list unordered and performing a linear search

29

In order to make a decision we need to know what the Dominant factor or process is in our application

If the list is fairly static and there will be extensive searches for specific drivers then the search is the dominant factor and our solution needs to make sure that the search algorithm is efficient even at the expense of a costly SORT algorithm

30

If the list is dynamic and the searching is infrequent then the inserting or adding algorithm efficiency overrides the search efficiency

We will learn when we discuss Data Structures that this solution requires the evaluation of the efficiency (Big-O) of competing ways to store and maintain data

31

At this point we know of only the array or ArrayList as a potential Data Structure but we will soon cover Linked Lists, Binary Trees and Hash Tables

Lets assume that in our project, the list of licensed drivers will be about 1 million and the list once loaded will remain static

32

Lets also assume that there will be frequent requests for information on specific drivers

This information provides us with our solution, we will order the data so that we can provide an efficient method for searching the list

33

There are MANY ways to sort a list(MergeSort, QuickSort, InsertionSort )

We will cover all of them in a later lecture, but for now we will focus on using the Selection Sort, and we will look back at the Bubble Sort

34

Bubble Sort (a Review):

Sort an array in ascending or descending order by evaluating the nth element against the nth+1 element

If they are not in the prescribed order, swap them

When we reach the end of the array, all of the items will be sorted

How will we sort our Driver class list ?

35

int c1, c2, leng, temp;Driver temp;

leng = myDriver.length;for (c1 = 0; c1 < (leng - 1); c1++){

for (c2 = (c1 +1); c2 < leng; c2++){if (myDriver[c1].compareTo(myDriver [c2]) > 1)

{temp = myDriver [c1];myDriver [c1] = myDriver [c2];myDriver [c2] = temp;

}}

}

36

What are we actually Swapping here ?

This sort has a nested for loop

This means that for each element of the list, the inner loop is executed

In effect we perform the number of steps equal to the number of elements squared

That’s why we call this an O(n^2) sort

37

Selection SortAn algorithm for arranging the

elements of an array in (ascending) order

Find the largest element on the array and swap it with the last element, then continue the process for the first n-1 elements

38

1st iteration takes the LARGEST ARRAY element and swaps it with the LAST array element

The largest element is now in its correct place and will not be moved again

39

We logically reduce the size of the array and ignore the “last” element(s)

40

Steps in selection sort:

Initialize a variable, n , to the size of the array

Find the largest among the first n elements

Make it swap places with the nth element Decrement n by 1

Repeat steps 2 to 4 while n >= 2

41

SELECTION SORT OUTPUT:initial array:2 97 17 37 12 46 10 55 80 42 39selection sort in progress...2 39 17 37 12 46 10 55 80 42 972 39 17 37 12 46 10 55 42 80 972 39 17 37 12 46 10 42 55 80 972 39 17 37 12 42 10 46 55 80 97

42

SELECTION SORT OUTPUT:initial array:2 97 17 37 12 46 10 55 80 42 39selection sort in progress...

2 10 12 17 37 39 42 46 55 80 97

43

The same procedure can be used to sort the array in descending order by finding the SMALLEST element in the array

44

For the same reasons as the bubble sort, this is also an O(n^2) sort

45

Once we have sorted the list, there is no need to apply a linear (Sequential) search unless you need to accumulate data about each driver in the list

We are now free to apply an efficient search algorithm

46

Binary Search:

The concept of a Binary search is that it continually & logically divides the list in half until the element is found or the logical size of the list is eliminated

It is an algorithm for quickly finding the element with a given value in a sorted array

47

Used to find the location of a given “target” value in an array by searching the array

Works on sorted arrays. Unsorted arrays need to be searched element by element

48

Take a sorted (acsending) array of n elements and search for a given value, x

49

Locate the middle element

Compare that element with x

A match ends the search

50

If x is smaller, the target element is in the LEFT half of the array

51

If x is larger, the target element is in the RIGHT half of the array

52

With each iteration we narrow the search by 50%

The search eventualy ends when a match is found or “right - left” becomes negative (target not found)

53

IN GENERAL, An array of (2^n) –1 elements requires, at MOST, n comparisons

54

For example: a 7 element array willrequire 3 iterations: (2^3)-17 = 2 to the 3rd power minus 1 therefore n = 3

55

Left set to ZERORight set to array length - 1middle = left + right / 2

if target val > middle valleft = middle + 1

if target val < middle valright = middle + 1

56

TARGET: 121 4 7 8 12 16 21

---- middle is 6/2 or 3 ---- element 3 is valued at 7---- 12 is greater than 7 so change

the middle to 10/2 or 5 //10 is 4 + 6--- element 5 is valued at 12--- 12 (target) = 12 (5th array

element)

57

An array of 15 elements requires a max of 4 iterations (2^4) - 1

An array of 1,000,000 elements requires a max of 20 iterations (2^20) - 1

58

Lets revisit and update our performance chart to see the efficiency of a Logarithmic algorithm

N N^2 N Log(N) Log (N)

100 10,000 664 6.64300 90,000 2,468 8.221,000 1,000,000 9,965 9.97100,000 10,000,000,000 1,660,964 16.61

59

TPS:

Write you own Binary Search that will work against your Driver class

60

Tips for the AP Exam:

Given an algorithm, count the number of times a specific statement executes

In writing a Sort algorithm be aware of the “off by 1” problem. length – 1

An array must be sorted before a Binary Search will work

N Log(N) algorithms are more efficient than O(N^2)

61

Project:

POE

Documents

1 Algorithms Starring: Binary Search Co Starring: Big-O