19
UNIT 5 Prof. Sushant S Sundikar 1 UNIT 5 The related activities of sorting, searching and merging are central to many computer applications. Sorting and merging provide us with a means of organizing information take advantage of the organization of information and thereby reduce the amount of effort to either locate a particular item or to establish that it is not present in a data set. Sorting algorithms arrange items in a set according to a predefined ordering relation. The two most common types of data are string information and numerical information. The ordering relation for numeric data simply involves arranging items in sequence from smallest to largest (or vice versa) such that each item is less than or equal to its immediate successor. This ordering is referred to as non- descending order. Sorted string information is generally arranged in standard lexicographical or dictionary order. Sorting algorithms usually fall into one of two classes: The simpler and less sophisticated algorithms are characterized by the fact that they require of the order of n 2 comparisons (i.e. 0(n 2 )) to sort items. The advanced sorting algorithm take of the order of n log 2 n (i.e., O(nlog 2 n)) comparisons to sort n items of data. Algorithms within this set come close to the optimum possible performance for sorting random data Problem: Merge two arrays of integers, both with their elements in ascending order into a single ordered array.

Unit 5: Searching,Sorting and Merging - Knowledge Unlimited · UNIT 5 Prof. Sushant S Sundikar 1 UNIT 5 The related activities of sorting, searching and merging are central to many

Embed Size (px)

Citation preview

UNIT 5

Prof. Sushant S Sundikar 1

UNIT 5

� The related activities of sorting, searching

and merging are central to many computer

applications.

� Sorting and merging provide us with a means

of organizing information take advantage of

the organization of information and thereby

reduce the amount of effort to either locate

a particular item or to establish that it is not

present in a data set.

� Sorting algorithms arrange items in a set

according to a predefined ordering relation.

� The two most common types of data are

string information and numerical

information.

� The ordering relation for numeric data

simply involves arranging items in sequence

from smallest to largest (or vice versa) such

that each item is less than or equal to its

immediate successor.

� This ordering is referred to as non-

descending order.

� Sorted string information is generally

arranged in standard lexicographical or

dictionary order.

� Sorting algorithms usually fall into one of two

classes:

� The simpler and less sophisticated algorithms are

characterized by the fact that they require of

the order of n2 comparisons (i.e. 0(n2)) to sort

items.

� The advanced sorting algorithm take of the order

of n log2n (i.e., O(nlog2n)) comparisons to sort n

items of data. Algorithms within this set come

close to the optimum possible performance for

sorting random data

� Problem:

� Merge two arrays of integers, both with their

elements in ascending order into a single ordered

array.

UNIT 5

Prof. Sushant S Sundikar 2

� Algorithm Development

� Merging two or more sets of data is a task that is

frequently performed in computing.

� It is simpler than sorting because it is possible to

take advantage of the partial order in the data.

� Examination of two ordered arrays should help to

discover the essential of a suitable merging

procedure.

� Consider the two arrays:

� A little though reveals that the merged result

should be as indicated below:

� The origins are written above each element in

the c array.

� What we see here is c is longer than a and b.

� In fact c must contain a number of elements

corresponding to the sum of the elements in a

and b(i.e a+b).

� To see how this might be done let us consider the

smallest merging problem.

� To merge the two one dimensional array all we

need to do is select the smaller of the a and b

elements and place it in c.

� The larger element is then placed in c.

� 8 is less than 15 so 8 will take c[1] place and 15

c[2] place.

� In the same way we start merging arrays of

lengths m and n.

� The comparison between a[1] and b[1] allows

us to set c[1].

� After placing 8 in c[1] we need a way of deciding which element must be placed next in the c array.

� In the general case the next element to be placed into c is always going to be the smaller of the first elements in the unmanaged parts of arrays a and b.

� To keep track of the “yet to be merged” parts of both the a and b arrays two index pointers i and j will be needed.

� As an element is selected from either a or b the appropriate pointer must be incremented /decremented by 1.

�Overall the entire process would be:

UNIT 5

Prof. Sushant S Sundikar 3

� Algorithm Description

UNIT 5

Prof. Sushant S Sundikar 4

� Algorithm

� Applications

� Sorting

� Tape sorting

� Data processing

� Problem

� Given a randomly ordered set of n numbers sort

them into non-descending order using exchange

method.

� Almost all sorting methods rely on exchanging

data to achieve the desired ordering.

� This method we will now consider relies heavily

on an exchange mechanism.

� Suppose we start out with the following random

data set:

� We notice that the first two elements are “out of

order”.

� If 30 and 12 are exchanged we will have the

following configuration:

� After seeing the above result we see that the

order in the data can be increased further by

now comparing and swapping the second and

third elements.

� With this new change we get the configuration

� The investigation we have made suggests that the

order in the array can be increased using the

following steps:

� For all adjacent pairs in the array do

� If the current pair of elements is not in non-descending

order then exchange the two elements.

� After applying this idea to all adjacent pairs in our

current data set we get the configuration below;

UNIT 5

Prof. Sushant S Sundikar 5

� Since there are n elements in the data this

implies that (n-1) passes (of decreasing length)

must be made through the array to complete the

sort.

� Algorithm Description

� Algorithm

� Applications

� Only for sorting data in which a small percentage

of elements are out of order.

� Problem

� Given a randomly ordered set on n numbers sort

them into non-descending order using an

insertion method.

UNIT 5

Prof. Sushant S Sundikar 6

� This is a simple sorting algorithm that builds the

final sorted array (or list) one item at a time.

� Insertion sort iterates, consuming one input

element each repetition, and growing a sorted

output list.

� Each iteration, insertion sort removes one

element from the input data, finds the location

it belongs within the sorted list, and inserts it

there.

� It repeats until no input elements remain.

� Sorting is typically done in-place, by iterating up

the array, growing the sorted list behind it.

� At each array-position, it checks the value there

against the largest value in the sorted list (which

happens to be next to it, in the previous array-

position checked).

� If larger, it leaves the element in place and

moves to the next.

� If smaller, it finds the correct position within the

sorted list, shifts all the larger values up to make

a space, and inserts into that correct position.

� The resulting array after k iterations has the property where the first k + 1 entries are sorted ("+1" because the first entry is skipped).

� In each iteration the first remaining entry of the input is removed, and inserted into the result at the correct position, thus extending the result:

� becomes

� with each element greater than x copied to the right as it is compared against x.

� To understand this sorting algorithm lets take up

an example

UNIT 5

Prof. Sushant S Sundikar 7

� Our complete algorithm can now be described as:

� To perform an insertion sort, begin at the left-

most element of the array and invoke Insert to

insert each element encountered into its correct

position.

� The ordered sequence into which the element is

inserted is stored at the beginning of the array in

the set of indices already examined.

� Each insertion overwrites a single value: the

value being inserted.

� Algorithm description

� Algorithm

� Applications

� Where there are relatively small data sets.

� It is sometimes used for more advanced quick

sort algorithm.

� Problem

� Given a randomly ordered set on n numbers sort

them into non-descending order using Shell’s

diminishing increment insertion method.

� Algorithm development

� A comparison of random and sorted data sets

indicates that for an array of size n elements

need to travel on average a distance of about

n/3 places.

� This observation suggests that progress towards

the final sorted order will be quicker if elements

are compared and moved initially over longer

rather than shorter distances.

� This strategy has the effect (on average) of

placing each element closer to its final position

earlier in sort.

� A strategy that moves elements over long

distances is to take an array of size n and start

comparing elements over a distance of n/2 and

then successively over the distances n/4, n/8,

n/16 and …. 1.

� Consider what happens when the n/2 idea is

applied to the dataset below

UNIT 5

Prof. Sushant S Sundikar 8

� After comparisons and exchanges over the

distance n/2 we have n/2 chains of length two

that are sorted.

� The next step is to compare elements over a

distance n/4 and thus produce two sorted chains

of length 4.

� Notice that after the n/4 sort the “amount of

disorder” in the array is relatively small.

� The final step is to form a single sorted chain of

length 8 by comparing and sorting elements

distance 1 apart.

� Since the relative disorder in the array is small

towards the end of the sort (i.e. when we are

n/8- sorting in this case) we should choose our

method for sorting the chains ( an algorithm that

is efficient for sorting partially order data).

� The insertion short should be better because it

does not rely so heavily on exchanges.

� The next and most important consideration is to

apply insertion sorts over the following distances

: n/2, n/4, n/8 , … , 1.

� We can implement this as follows

� The next steps in the development are to

establish how many chains are to sorted and for

each increment gap and then to work out how to

access the individual chains for insertion sorting.

� We can therefore expand our algorithm to

�Now comes the most crucial stage of the

insertion sort.

� In standard implementation the first element

that we try is to insert is the second element in

the array .

� Here for each chain to be sorted it will need to

be second element of each chain

� The position of k can be given by:

� Successive members of each chain beginning with

j can be found using

� Algorithm description

UNIT 5

Prof. Sushant S Sundikar 9

� Algorithm

� Shellsort.txt

� Applications

� Works well on sorting large datasets by there are

more advanced methods with better

performance.

� Problem

� Given a randomly ordered set of n numbers, sort

them into non-descending order using Hoare’s

partitioning method.

� Algorithm development

� Take guess and select an element that might

allow us to distinguish between the big and the

small elements.

� After first pass we have all big elements in the

right half of the array and all small elements in

the left half of the array.

� To achieve this do the following

� Extend the two partitions inwards until a wrongly

partitioned pair is encountered.

� While the two partitions have not crossed

� Exchange the wrongly partitioned pair;

� Extend the two partitions inwards again until another

wrongly partitioned pair is encountered.

� Applying this ideas to the sample data set

� Element 18 is selected as pivot element

� This step has given us two independent sets of

elements which can be sorted independently.

� The basic mechanism to do sort partitions is :

� While all partitions are not reduced to size one

do:

� Choose next partition to be processed;

� Select a new partitioning value from the current

partition;

� Partition the current partition into two smaller

partially ordered sets.

UNIT 5

Prof. Sushant S Sundikar 10

� After creating partitions of size one do the

following:

� Choose the smaller partition to be processed next;

� Select the element in the middle of the partition as

the partitioning value;

� Partition the current partition into two partially

ordered sets;

� Save the larger of the partitions from step c for later

processing.

� Algorithm description

� Algorithm

� Applications

� Internal sorting of large datasets.

� Problem

� Given an element x and a set of data that is in

strictly ascending numerical order find whether

or not x is present in the set.

� Algorithm Development:

� Let us now consider an example in order to

try to find the details of the algorithm

needed to implement.

� Suppose we are required to search an array of 15

ordered elements to find x= 44 is present . If

present then return the position of the array that

contains 44.

UNIT 5

Prof. Sushant S Sundikar 11

� We start by examining the middle value in the

array.

� To get the middle value of size n we can try

middle <- n / 2;

� For the above problem middle value is 8

� This gives a[middle] = a[8] =39

� Since the value we are seeking is greater than 39

it must be somewhere in the range a[9] … a[15].

� That is 9 becomes the lower limit and 15 upper

limit.

� lower = middle +1

� We then have

� To calculate the middle index 9 +15 / 2 =12

� a[12]=49 > 44 so search in a[9] .. a[11].

� Using the same above procedures calculate the

middle position.

� Our middle position is 10 and a[10] contains44

which is matching with our key to be found.

� Hence return the position 10 .

� Algorithm Description

� Problem

� Design and implement a hash searching

algorithm.

� Algorithm Description

UNIT 5

Prof. Sushant S Sundikar 12

� Algorithm

� Hashsearch.txt

� Applications

� Fast retrieval from both small and large tables

Unit 5 Algorithms

1. Two Way Merge

ALGORITHM merge(a,b,c,m,n)

//PROBLEM STATEMENT: Merge two arrays of integers, both with their

elements in ascending order into a single ordered array.

//INPUT: a : integer array with n elements and size n as integer

b : integer array with m elements and size m as integer

//OUTPUT: sorted array c.

{

if(a[m] <= b[n]) then

mergecopy(a,b,c,m,n);

else

mergecopy(b,a,c,n,m);

}

ALGORITHM mergecopy(a,b,c,m,n)

{

// i : first position in a array

// j: current position in b array

// k: current position in merged array –initally 1

i<--1;

j<--1;

k<--1;

if( a[m] <= b[i]) then

{

copy(a,c,i,m,k);

copy(b,c,i,n,k);

}

else

{

shortmerge(a,b,c,m,j,k);

copy(b,c,j,n,k);

}

}

ALGORITHM copy(b,c,j,n,k)

{

for i <-- j to n do

{

c[k] <-- b[i];

k <-- k+1;

}

}

ALGORITHM shortmerge(a,b,c,m,j,k)

{

while i <= m do

{

if a[i] <= b[j] then

{

c[k] <-- a[i];

i <- - i + 1

}

else

{

c[k] <-- b[i];

j <- - j + 1

}

k <-- k+1;

}

}

2. Sort by Exchange

ALGORITHM bubblesort(a,n)

//PROBLEM STATEMENT: Given a randomly ordered set of n numbers sort

them into non-descending order using exchange method.

//INPUT: a : integer array with n elements and size n as integer

//OUTPUT: sorted array a.

{

i <- 0;

sorted <- false;

while(i<n) AND(NOT sorted) do

{

sorted <- true;

i <- i + 1;

for j <- 1 to n-i

{

if a[j] >a[j+1] then

{

t <- a[j];

a[j] <- a[j+1];

a[j+1] <-t;

sorted =false;

}

}

}

return a;

}

3. Sorting By Insertion

ALGORITHM insertionsort (a,n)

PROBLEM STATEMENT: Given a randomly ordered set on n numbers sort

them into non-descending order using an insertion method.

INPUT: a -array of unsorted elements

n - size of array

i - increasing index of number of elements ordered in

each stage

j- decreasing index used for searching insertion position

first - smallest element in array

p - original position of smallest element

x -current element to be inserted

OUTPUT: Sorted array a.

{

// FIND MINIMUM TO ACT AS SENTINAL

first <- a[1];

p <- 1;

for i <- 2 to n do

{

if a[i] < first then

{

first <- a[i];

p <- l;

}

a[p] <- a[1];

a[1] <- first;

}

//inserting ith element - note a[1] is a sentinal

for i <- 3 to n do

{

x <- a[i];

j <-i;

while x < a[j-1] do

{

a[j] <- a[j-1];

j <- j - 1;

}

a[j] <- x;

}

return a;

}

4. Sorting by diminishing increment

ALGORITHM shellsort(a, n)

//PROBLEM STATEMENT:

//INPUT: a- integer array of size n

//OUPUT: Sorted array a.

{

//variable description

// inc - step size at which elements are to be sorted.

// current- position in chain where x is finally

inserted.

// previous - indes of element currently being compared

with x

// j - index for lowest element in current chain being

sorted.

// k - index of current element being inserted

// x - current value to be inserted

// inserted - is true when insertion can be made

inc =n;

while inc > 1 do

{

inc <- inc / 2;

for j <- 1 to inc do

{

k <- j + inc;

while k <=n do

{

inserted <- false;

x <- a[k];

current <- ;

previous <- current - inc;

while( previous >= j) and (not inserted) do

{

//locate the position and perform

insertion of x

if x < a[previous] then

{

a[current] <- a[previous];

current <- previous;

previous <- previous -inc;

}

else

{

inserted <- true;

}

}

a[current] <- x;

k <- k + inc;

}

}

}

return a;

}

5. Sorting By Partitioning

ALGORITHM quicksort(a, n, stacksize)

//INPUT: a - an integer array of size n

//OUTPUT: an sorted array

{

//variables used in the algorithm

//left - upper limit of left partition

//right- lower limit of right partition

//newleft - upper limit of extended left partition

//right- lower limit of extended right partition

//middle - middle index of current partition

//mguess - current guess at median

//temp - temporary variable used for exchange

//stacktop - current top of stack

//stack - array[1,100] of integers

stacktop <- 2;

stack[1] <- 1;

stack[2] <- n;

while stacktop > 0 do

{

right <- stack[stacktop];

left <- stack[stacktop - 1];

stacktop <- stacktop - 2;

}

while(left < right) do

{

newleft <- left;

newright <- right;

middle <- (left + right) / 2;

mguess <- a[middle];

while a[newleft] < mguess do newleft <- newleft + 1;

while a[newright] < mguess do newright <- newright -

1;

while newleft < newright-1 do

{

temp <- a[newleft];

a[newleft] <- a[newright];

a[newright] <- temp;

newleft <- newleft +1;

newright <- newright -1;

while a[newleft] < mguess do newleft <-

newleft + 1;

while a[newright] < mguess do newright <-

newright-1;;

}

if newleft <= newright then

{

if newleft < newright then

{

temp <- a[newleft];

a[newleft] <-a[newright];

a[newright] <- temp;

}

newleft <- newleft + 1;

newright <- newright 1;

}

if newright <middle then

{

stack[stacktop+1] <- newleft;

stacktop <- stacktop +2;

stack[stacktop] <- right;

right <- newright;

}

else

{

stack[stacktop+1] <- left;

stacktop <- stacktop +2;

stack[stacktop] <- newright;

left <- newleft;

}

}

return a;

}

6. Binary Search

7. Hash Searching

ALGORITHM hashsearch(table,position,found,tablesize,empty,

key)

PROBLEM STATEMENT: Design and implement a hash searching

algorithm.

INPUT: table - hash table to be searched

tablesize - an integer to set the size of the table

found - boolean value to set if element is found or

not.

key - key to be searched

empty- an integer for empty value

temp - temporary storage for value at position start

start - hash value index to table

active - if true continue search of table

OUTPUT: Position position of the key element.

{

active <-- true;

found <-- false;

start <-- key mod tablesize;

position <-- start;

if table[start] = key then

{

active <-- false;

found <-- true;

temp <-- table[start];

}

else

{

temp <-- table[start];

table[start] <-- key;

}

while active do

{

position <-- position + 1 mod tablesize;

if table[position] = key then

{

active <-- false;

if position<> start then

found <-- true;

}

else

{

if table[position] = empty then

{

active <-- false;

}

}

}

table[start] <-- temp;

}