Pseudo Code

What Is Pseudocode? An algorithm is a sequence of instructions to solve a well-‐‑formulated computational problem specified in terms of its input and output. An algorithm uses the input to generate the output. For example, the algorithm PatternCount uses strings Text and Pattern as input to generate the number Count(Text, Pattern) as its output. In order to solve a computational problem, you need to carry out the instructions specified by the algorithm. For example, if we want you to count how many times Pattern appears in Text, we could tell you to do the following: Counting a pattern in a text:

1. Start from the first position of Text and check whether Pattern appears in Text starting at its first position.

2. If yes, draw a dot on a piece of paper. 3. Move to the second position of Text and check whether Pattern appears in Text

starting at its second position. 4. If yes, draw another dot on the same piece of paper. 5. Continue until you reach the end of Text. 6. Count the number of dots on the piece of paper.

Since humans are slow, make mistakes, and hate repetitive work, they invented computers that are fast, do not mind repetitive work, and hardly ever make mistakes. However, while you may find our instructions for counting a pattern in a text clear, no computer in the world would be able to execute them. These instructions are imprecise, and the only reason you can understand them is because you have been trained for many years to understand human language. For example, we haven’t specified that you should start from a blank piece of a paper (without any dots), but you have assumed it. We haven’t specified what it means “to reach the end of Text”, i.e., at what position of Text should we stop? Because computers do not understand human language, algorithms must be rephrased in a programming language (such as Python, Java, C++, Perl, Ruby, Go, or dozens of others) in order to give specific instructions to the computer. However, we don’t want to describe algorithms in a specific language like Python because your favorite language may be Java. Even more importantly, our focus is on algorithmic ideas rather than on implementation details, which is why we want to meet you halfway between human language and a programming language. Pseudocode is a way to describe algorithms that emphasizes ideas rather than implementation details and is neither too vague (like human languages) nor too formal (like programming languages). Pseudocode ignores many of the details that are required

in a programming language, but it is more precise and less ambiguous than, say, a cookbook recipe like the one below: Pumpkin Pie Recipe:

1. Combine pumpkin (2 cups), sugar (1 teaspoon), salt (1 teaspoon), ginger (1 teaspoon), cinnamon (2 teaspoons), molasses (2 tablespoons), eggs (3) and milk (12 ounces).

2. Mix thoroughly. 3. Pour into pie crust (1) and bake in oven (425 degrees Fahrenheit) until knife

inserted comes out clean. A computer would be lost seeing a recipe like this: it does not even specify that you need a bowl to mix the ingredients! How a computer would know this? To illustrate the difference between pseudocode and human language, below is our attempt at pseudocode for baking a pumpkin pie. PumpkinPie(pumpkin, sugar, salt, spices, eggs, milk, crust, bowl) PreheatOven(425) Add(pumpkin, bowl) Add(sugar, bowl) Add(salt, bowl) Add(spices, bowl) Add(bowl) Add(eggs, bowl) Add(milk, bowl) Stir(bowl) filling ß contents of bowl pie ß Pour(crust, filling) while knife inserted does not come out clean BakeInOven(pie) output “Pumpkin pie is ready!” return pie STOP and Think: Why would PumpkinPie lead to a disaster in the kitchen? Since computer scientists like brevity, most pseudocode in this book (even pseudocode that describes a complex algorithm!) will be shorter than this pseudocode. Programmers also break their programs into short bite-‐‑sized pieces called subroutines. for example, the pseudocode above can be modularized by writing a subroutine MixFilling to parse out the blue lines above into its own program. PumpkinPie(pumpkin, sugar, salt, spices, eggs, milk, crust, bowl) PreheatOven(425) filling ß MixFilling(pumpkin, sugar, salt, spices, eggs, milk, bowl)

pie ß Pour(crust, filling) while knife inserted does not come out clean Bake(pie) output “Pumpkin pie is ready!” return pie MixFilling(pumpkin, sugar, salt, spices, eggs, milk, bowl) Put(pumpkin, bowl) Put(sugar, bowl) Put(salt, bowl) Put(spaces, bowl) Put(eggs, bowl) Put(milk, bowl) Stir(bowl) filling ß contents of bowl return filling After the subroutine MixFilling has been written, the main program PumpkinPie (the chef in the kitchen) simply calls this subroutine (a sous-‐‑chef) to mix the ingredients. It may appear that we have not reduced the number of lines of the codes by breaking one program into two, but you will soon see that breaking a program into subroutines greatly helps in revealing the logic of an algorithm and finding mistakes in your code. Here is a simple program Distance, whose input is four numbers and whose output is one number. Can you guess what it does? Distance(x1, y1, x2, y2) d ß (x2 − x1)2 + (y2 − y1)2 d ß √d return d Distance(x1, y1, x2, y2) computes the Euclidean distance between points on the plane with coordinates (x1, y1) and (x2, y2). For example, Distance(0, 0, 3, 4) returns

√(3 − 0)2 + (4 − 0)2 = 5 as its output. An algorithm in pseudocode is denoted by a name (Distance), followed by the list of arguments that it requires (x1, y1, x2, y2); this is followed by the statements that describe the algorithm’s actions. One can invoke an algorithm by passing it the appropriate values for its arguments. For example, Distance(1, 3, 7, 5) would return the distance between points (1, 7) and (3, 5) in two-‐‑dimensional space. The operation return reports the result of the algorithm. The pseudocode Distance uses the concept of a variable, written as x1, or d, or whatever

name you want to choose, to denote this variable. For example, the pseudocode below works exactly like the pseudocode above: Distance(x, y, z, t) abracadabra ß (z − x)2 + (t − y)2 abracadabra ß √ abracadabra return abracadabra

A variable contains some value and can be assigned a new value at different points throughout the course of an algorithm. To assign a new value to a variable we use the expression a ß b that sets the variable a equal to the value b. For example, in the pseudocode above, if you compute Distance(0, 0, 3, 4), then abracadabra is first assigned the value (3 − 0)2 + (4 − 0)2 = 25 and then is assigned the value √25 = 5. Whereas computer scientists are accustomed to pseudocode, we fear that some biologists reading it might decide that pseudocode is too cryptic and therefore useless. Although modern biologists deal with algorithms on a daily basis, the language they use to describe an algorithm might be closer to the language used in a cookbook. Accordingly, some bioinformatics books are written in cookbook lingo as an effort to make biologists feel at home with different bioinformatics concepts. Unfortunately, this language is insufficient to describe the more complex algorithmic ideas behind various bioinformatics tools. We have thus far described pseudocode only generally. In the next section, we will delve into the nuts and bolts that you can use to write your own pseudocode and that we use throughout our text.

Nuts and Bolts of Pseudocode This section will describe some but not all details of pseudocode; we will often avoid tedious details in the specification of an algorithm by specifying parts of it in English, using operations that are not listed among the nuts and bolts of the pseudocode, or by omitting certain details. We assume that, in the case of confusion, you will fill in the details using pseudocode operations in a sensible way. If Statements Exercise Break: The algorithm Minimum(a, b) shown below has two integers (a and b) as

its input and a single integer as its output. Can you guess what it does? Minimum(a, b) if a > b return b else return a This algorithm, which computes the minimum of a and b, uses the following construction: if statement X is true execute instructions Y else execute instructions Z

If statement X is true, it executes instructions Y; otherwise, it executes instructions Z. For example, Minimum(1, 9) returns 1 because the condition “1 > 9” is false. Sometimes we will omit “else C”, in which case the pseudocode will either execute B or not, depending on whether A is true. The pseudocode below computes the minimum of three numbers: Minimum3(a, b, c) if a > b if b > c return c else return b else if a > c return c else return a Minimum3 is correct, but below is a more compact version that uses the Minimum function we already wrote as a subroutine. Minimum3(a, b, c) if a > b Minimum(b, c) else Minimum(a, c) Exercise Break: Write pseudocode for an algorithm Minimum4(a, b, c, d) that computes the minimum of four numbers.

For loops Exercise Break: The algorithm Gauss(n), shown below, has one integer (n) as its input and one integer as its output. Can you guess what it does? Gauss(n) sum ß 0 for i ß 1 to n sum ß sum + i return sum This pseudocode computes the sum of the first n integers (e.g., Gauss(5) = 1 + 2 + 3 + 4 + 5 = 15) and uses a for loop that has the following format: for i ß a to b execute X The for loop first sets i to a and executes instructions X. Afterwards, it increases i by 1 and executes instructions X again. It further repeats this process by increasing i by 1 until it becomes equal to b, i.e., i varies through values a, a + 1, a + 2, a + 3, . . ., b − 1, b during execution of the for loop. In other words, a function computing Gauss(5) could be alternatively written as shown below, but we do not endorse this programming style! Gauss5 sum ß 0 i ß 1 sum ß sum + i i ß i+1 sum ß sum + i i ß i+1 sum ß sum + i i ß i+1 sum ß sum + i i ß i+1 sum ß sum + i return sum Exercise Break: Can you write pseudocode for Gauss(n) that uses only a single line? In 1785, a primary school teacher asked his class to add together all the numbers from 1 to 100, assuming that this task would occupy them for the whole day. He was shocked when an 8 year-‐‑old boy thought for a few seconds and wrote down the answer 5,050. This boy was Karl Gauss, and he would go on to become one of the

greatest mathematicians of all time. The following one-‐‑line algorithm implements Gauss’ idea for computing the sum of the first n integers (why does it work?): Gauss(n) return (n+1)*n/2

Below is yet another algorithm computing the sum of the first n integers: RecursiveGauss(n) if n > 0 sum ß RecursiveGauss(n-1) + n else sum ß 0 return sum

Note that RecursiveGauss(n) asks RecursiveGauss(n-‐‑1) for help! Imagine that you need to compute the sum of the first five integers, but you are lazy and ask your friend to compute the sum of the first four integers for you. As soon as your friend has computed it, you add 5 to the result, and you are done! Little do you know that your friend is also lazy, and instead of computing the sum of the first four integers, she asks her friend to compute the sum of the first three numbers, then she adds 4 to the result and passes it to you. The story continues, and although everybody in this chain of friends is lazy, they are able to compute the sum of the first five integers. In this book, you will encounter recursive algorithms that, similarly to RecursiveGauss, subcontract a job by calling themselves (on a smaller input). Exercise Break: Do you see the pattern in the sequence (1, 1, 2, 3, 5, 8, 13, 21, 34, 55, … )? Write pseudocode for an algorithm that, given an integer n, computes the n-‐‑th number in this sequence. Exercise Break: Analyze the pseudocode Rabbits below. What does Rabbits(10) return? Rabbits(n) a ß 1 b ß 1 for i ß 3 to n new ß a+b a ß b b ß new return b Rabbits(n) computes the Fibonacci numbers 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, … (Why do you think we call this function “Rabbits”?)

While loops Exercise Break: The pseudocode AnotherGauss below has one integer (n) as its input and one integer as its output. Can you guess what it does? AnotherGauss(n) sum ß 0 i ß 1 while i ≤ n sum ß sum + i i ß i+1 return sum AnotherGauss solves exactly the same problem as Gauss by using a different type of a loop, a while loop that has the following format: while statement X is true execute Y The while loop checks condition X; if X is true, then it executes instructions Y. It then checks X again; if X is true, it executes Y again. This process is repeated until X is false. (If X winds up always being true, then the while loop enters an infinite loop, which you should avoid at all costs, because your algorithm will never end.) Arrays When we write pseudocode, in addition to the concept of a variable, we also use arrays. An array of n elements is an ordered sequence of n variables a1, a2, . . . , an. We often use a single letter to denote an array, e.g., a = (a1, a2, . . . , an). Exercise Break: Why did we call the following algorithm Fibonacci? And what does Fibonacci(10) return? Fibonacci(n) a1 ß 1 a2 ß 1 for i ß 3 to n ai ß ai−1 + ai−2 return an

Documents

Pseudo Code