Topic 1011: Topics in Computer Science

Topic 1011: Topics in Computer Science

Dr J Frost ([email protected])

Last modified: 2nd November 2013

A note on these Computer Science slidesThese slides are intended to give just an introduction to two key topics in Computer Science: algorithms and data structures. Unlike the other topics in the Riemann Zeta Club, they’re not intended to give a deeper knowledge required for solving difficult problems. The main intention is to provide an initial base of Computer Science knowledge, which may help you in your university interviews.

In addition to these slides, it’s highly recommended that you study the following Riemann Zeta slides to deal with more specific Computer-Science-ey questions:

1. Logic2. Combinatorics3. Pigeonhole Principle

? Any box with a ? can be clicked to reveal the answer (this works particularly well with interactive whiteboards!).Make sure you’re viewing the slides in slideshow mode.

A: London B: Paris C: Madrid

For multiple choice questions (e.g. SMC), click your choice to reveal the answer (try below!)

Question: The capital of Spain is:

Slide Guidance

Contents

1. Time and Space Complexity2. Big O Notation3. Sets and Lists4. Binary Search5. Sorted vs Unsorted Lists6. Hash Tables7. Recursive Algorithms8. Sorting Algorithms

a) Bubble Sortb) Merge Sortc) Bogosort

Time and Space ComplexitySuppose we had a list of unordered numbers, which the computer can only view one at a time.

1 4 2 9 3 7

Suppose we want to check if the number 8 is in the list.If the size of the problem is , (i.e. there are cards in the list), then in the worst case, how much time will it take to check that some number is in there?And given that the list is stored on a disc (rather than in memory), how much memory (i.e. space) do we need for our algorithm?

(Worst Case) Time ComplexityTime If there’s n items to check, and each takes some constant amount of time to check, so we know the time will be at most some constant times n.

Space ComplexitySpace We only need one slot of memory for the number we’re checking against the list, and one slot of memory for the current item in the list we’re looking at. So the space needed will be constant, and importantly, is not dependent on the size n of our list.

? ?

Big O notationSo the time and space complexity of an algorithm gives us a measure of how ‘complex’ the algorithm is in terms of the time it’ll take, and the space required to do its handywork.

In mathematics, Big O notation is used to measure how some expression grows.Suppose for example we have the function:

y = 2x

3 + 10

x2 +

3y =

2x3

We can see that as becomes larger, the and terms become inconsequential, because the term dominates.

Since and for all positive , then

We’re not interested in the scaling of 15, since this doesn’t tell us anything about the growth of the function.

We say that: 𝑓 (𝑥)=𝑂 (𝑥3) i.e. grows cubically.

Big O notationFormally, if , then there is some constant such that for all sufficiently large . So technically we could say that , because the big-O just provides an upper bound to the growth. But we would want to keep this upper bound as low as possible, so it would be more useful to say that .

1 4 2 9 3 7

While big-O notation has been around for centuries (particularly in number theory), in the 1950s, it started to be used to describe the complexity of algorithms.

Returning to our probably of finding a number in an ordered list, we can now express our time and space complexity using big-O notation (in terms of the list size ):

Time Complexity

𝑂 (𝑛)Space Complexity

𝑂 (1 )? ?

Remember that the constant scaling doesn’t matter in big-O notation. So ‘1’ is used to mean constant time/space.

Big O notation

Time Complexity We say the time complexity of the algorithm is…

𝑂 (1) constant time

𝑂 (𝑛) linear time

𝑂 (𝑛2) quadratic time

𝑂 (2𝑛) exponential time

𝑂 (log𝑛) logarithmic time

We’ll see some examples of more algorithms and their complexity in a second, but let’s see how we might describe algorithms based on their complexity…

?

?

?

?

𝑂 (𝑛𝑘) polynomial time?

Sets and lists

A data structure is, unsurprisingly, some way of structuring data, whether as a tree, a set, a list, a table, etc.

There’s two main ways of representing a collection of items: lists and sets.

Lists

Does ordering of items matter? Yes

Sets

No. and are the same set.

Duplicates allowed? Yes No

Example <4, -2, 3, 6, 3> {1 ,3 ,7 ,10 }? ?

? ?

Binary Search

1 3 4 7 9 12 15 20

Suppose we have either a set or list where the items are in ascending order. We want to determine if the number 14 is in the list.

Previously, when the items were unordered, we have to scan through the whole list (a simple algorithm where the time complexity was ).

But can we do better?

More specifically, seeing if an item is within an unsorted list is known as a linear search. (Because we have to check every item, taking time linear in n!)

Binary Search

1 3 4 7 9 12 15 20

This line represents where the number we’re looking for could possibly be. At the start of a binary search, the number could be anywhere.

A sensible thing to do is to look at the number just after the centre. That way, we can round down our search by half in one step.

In this case , so we know that if the number is in the collection, it must be in the second half of it.

Looking to see if 14 is in our

list/set.

Binary Search

1 3 4 7 9 12 15 20

Now we looking halfway across what we have left to check. The number just after the halfway point is 15.

, so if 14 is in our collection, it must be to the left of this point.

Looking to see if 14 is in our

list/set.

Binary Search

1 3 4 7 9 12 15 20

Now we’d compare our number 14 against the 12.

Now since , we now know that 14 is not in the collection of items. Looking to see if

14 is in our list/set.

Binary Search

1 3 4 7 9 12 15 20

We can see on each step, we half the number of items that need to be search. The number of steps (i.e. the time complexity) in terms of the number of items must therefore be:

Time Complexity ?

This makes sense when you think about it. If , then , i.e. we can half 16 four times until we get to 1, so only 4 steps are needed.You might be wondering why we wrote instead of . This is because changing the base of a log only scales by a constant, and as we saw, big-O notation doesn’t care about constant scaling. So the base is irrelevant.

Space Complexity ?We only ever looking at one number at a time, so only need a constant amount of memory.

Sorted vs unsorted lists

Keeping our list sorted, or leaving it unsorted, has advantages either way. We’ve already seen that keeping the list sorted makes it much quicker to see if the list contains an item or not.What is the time complexity of the best algorithm to do each of these tasks?

Sorted Unsorted

Seeing if the list contains a particular value. 𝑂 ( log𝑛)𝑂 (𝑛)Adding an item to the list. We find the correct position to insert in time using a binary

search. If we have some easy to splice in the new item somewhere in the middle of the list, without having to move

the items after up to make space, then we’re done. However, if we do have to move up the items after (e.g. the values are

stored in an ‘array’), then it takes time to shift the items up, hence it’s time overall.

We can just stick the item on the end!

Merging two lists (of size and respectively, where ) Start with the largest list, with its items. Then insert each of

the items from the second list into it. Each insert operation costs time (from above). But there’s items to add.

?

Easy again. Just have the end of the first list somehow link to the start of

the first list so that they’re joined together.

?

? ?

Sorted vs unsorted lists

We can see that the advantage of keeping the list unsorted is that it’s much quicker to insert new items into the list. However, it’s much slower to find/retrieve an item in the list, because we can’t exploit binary search.

So it’s a trade-off.

Sorted Unsorted

Seeing if the list contains a particular value. 𝑂 ( log𝑛)𝑂 (𝑛)Adding an item to the list. 𝑂 ( log𝑛)𝑂 (1 )Merging two lists (of size and respectively, where ) 𝑂 (𝑛 log𝑚 )𝑂 (1 )

Hash Table

Hash Tables are structure which allow us to do certain operations to do with collections much more quickly: e.g. inserting a value into the collection, and retrieving!

Imagine we had 10 ‘buckets’ to put new values into. Suppose we had a rule which decided what bucket to put a value into:

Find the remainder when is divided by 10 (i.e. )

1 2 3 4 5 6 7 8 90

Hash Table

1 2 3 4 5 6 7 8 90

2 31 67 42 19 112 55 57 29 33 69

We can use our “mod 10” hash function to insert new values into our hash table.

Hash Table

1 2 3 4 5 6 7 8 90

231 67

42

19

112

55

57 29

33

69

The great thing about a hash table is that if we want to check if some value is contained within it, we only need to check within the bucket it corresponds to.

e.g. Is 65 in our hash table?Using the same hash function, we’d just check “Bucket 5”. At this point, we might just do a linear search of the items in the bucket to see if the 65 matches. In this case, we’d conclude that 65 isn’t part of our collection of numbers.

Hash Table

1 2 3 4 5 6 7 8 90

231 67

42

19

112

55

57 29

33

69

Suppose we’ve put n items in a hash table with k buckets:

Operation Time ComplexityO(n/k)

But only if our chosen hash function distributes items fairly evenly across buckets. But if our data tended to have 1 as the last digit, “mod 10” would be a bad hash function because all the items would end up in the same bucket. The

result would be that if we wanted to then check if 71 was in our collection, we’d end up having to check every item still! Using “mod p” where p is a

prime, reduces this problem.

O(1)Presuming the hash function takes a constant amount of time to evaluate, we just stick the new item at the top of the correct bucket. We could always keep

the buckets sorted. In which case, insertion would take O(log(n/k)) time.

Seeing if some number is contained in our collection.

Inserting a new item into the hash table structure.

?

?

Recursive Algorithms

The Towers of Hanoi is a classic game in which the aim is to get the ‘tower’ (composed of varying sized discs) from the first peg to the last peg. There’s one ‘spare peg’ available.

The only rule is that a larger disc can never be on top of a larger disc, i.e. on any peg the discs must be in decreasing size order from bottom to top.

There’s two questions we might ask:1. For n discs, what is the minimum number of moves required to win?2. Is there an algorithm which generates the sequence of moves required?


We can answer both questions at the same time.

Suppose HANOI(START,SPARE,GOAL,n) is a function which generates a sequence of moves for n discs, where START is the start peg, SPARE is the spare peg and GOAL is the goal peg.

Then we can define an algorithm as such:


Recursively solve the problem of moving n-1 pegs from the start peg to the spare peg.

i.e. HANOI(START,GOAL,SPARE, n-1)

(notice that we’ve made the original goal peg the new spare peg and vice versa)It’s quite common to define a function in terms of itself but with smaller arguments. It’s recommended you first look at some of the examples in the Recurrence Relations section of the RZC Combinatorics slides to get your head around this.


Next move the 1 remaining disc (or whatever disc is at the top of the peg) from the start to goal peg.

i.e. MOVE(START,GOAL)


Finally, recursively solve the problem moving n-1 discs from the spare peg to the target peg.

i.e. HANOI(SPARE,START,GOAL, n-1)

Notice here that the original start peg is now the spare peg, and the spare peg the start peg.


Putting this together, we have the algorithm:

FUNCTION HANOI(START, SPARE, GOAL, n) = HANOI(START, GOAL, SPARE, n-1),MOVE(START, GOAL),HANOI(SPARE, START, GOAL, n-1)

But just like recurrences in maths, we need a ‘base case’, to say what happens when we only have to solve the problem when n=1 (i.e. we have one disc):

FUNCTION HANOI(START, SPARE, GOAL, 1) = MOVE(START, GOAL)


We can see this algorithm in action. If the 3 pegs are A, B and C, and we have 3 discs, then we want to execute HANOI(A, B, C, 3) to get our moves:

HANOI(A, B, C, 3) = HANOI(A, C, B, 2), MOVE(A, C), HANOI(B, A, C, 2) = HANOI(A, B, C, 1), MOVE(A, B), HANOI(C, B, A, 1), MOVE(A, C),

HANOI(B, C, A, 1), MOVE(B, C), HANOI(A, B, C, 1) = MOVE(A, C), MOVE(A, B), MOVE(C, A), MOVE(A, C), MOVE(B, A),

MOVE(B, C), MOVE(A, C)

A B C


The same approach applies when counting the minimum number of moves.Let F(n) be the number of moves required to move n discs to the target peg.

1. We require F(n-1) moves to move n-1 discs from the start to spare peg.2. We require 1 move to move the remaining disc to the goal peg.3. We require F(n-1) moves to move n-1 discs from the spare to goal peg.

This gives us the recurrence relation F(n) = 2F(n-1) + 1And our base case is F(1) = 1, since it only requires 1 move to move 1 disc.But just writing out the first few terms in this sequence, it’s easy to spot that the

position-to-term formula is F(n) = 2n – 1.

?

?

Sorting Algorithms

One very fundamental algorithm in Computer Science is sorting a collection of items so that they are in order (whether in numerical order, or some order we’ve defined).

We’ll look at the main well-known algorithms, and look at their time complexity.

2 31 674219 11255

Bubble Sort

231 674219 11255

At the end of the first ‘pass’*, we can guarantee that the largest number will be at the end of the list.We then repeat the process, but we can now ignore the last number (because it’s in the correct position). This continues, until eventually on the last pass, we only need to compare the first two items.

* A ‘pass’ in an algorithm means that we’ve looked through all the values (or some subset of them) within this stage. You can think of a pass as someone checking your university personal statement and making corrections, before you give this updated draft to another person for an additional ‘pass’.

Click to Animate

This looks at each pair of numbers first, starting with the 1st and 2nd, then the 2nd and 3rd , and swaps them if they’re in the wrong order:

Bubble Sort

231 674219 11255

O(n2)

The first pass requires n-1 comparisons, the next pass requires n-2 comparisons, and so on, giving us the sum of an arithmetic sequence.So the exact number of comparison is ½ n(n-1)This is growth quadratic in n, i.e. O(n2)

Time Complexity?

?

231 674219 11255

Merge Sort

4

First treat each individual value as an individual list (with 1 item in it!)

Then we repeatedly merge each pair of lists, until we only have 1 big fat list.

231 674219 11255 4

31 4219 55 2 67 1124

31 4219 552 67 1124

We’ll go into more detail on this ‘merge’ operation on the next slide.

Merge Sort

31 4219 55 2 67112 4

At each point in the algorithm, we know each smaller list will be in order.Merging two sorted lists can be done quite quickly (click the button below):

Click to Animate

General jist: Start with a marker at the beginning of each list.Compare the two elements at the markers. The lowest value gets put in the new list, and the marker at that item used moves up one. Then repeat!

New merged list

Merge Sort

O(n log n)

Each merging phase requires exactly n steps because when merging each pair of lists, each comparison puts an element in our new list. So there’s exactly n comparisons.

There’s log2 n phases, because similarly to the binary search, each phase halves the number of mini-lists.

Time Complexity?

?

Bogosort

The Bogosort, also known as ‘Stupid Sort’, is intentionally a ‘joke’ sorting algorithm, but provides some educational value. It simply goes like this:

1. Put all the elements of the list in a completely random order.2. Check if the elements are in order. If so, you’re done. If not, then go back to Step 1.

We can describe time complexity in different ways: the ‘worst-case behaviour’ (i.e. the longest amount of time the algorithm can possibly take) and the ‘average-case behaviour’ (i.e. how long we expect the algorithm to take on average)

Worst Case Time Complexity?

The algorithm theoretically may never terminate, because the order may be wrong every time.

Average Case Time Complexity?

O(n n!)There are n! possible ways the items can be ordered. Presuming no duplicates in the list, there’s a 1 in n! chance that the list is in the correct order. We therefore ‘expect’ to have to repeat Step 1 n! times.Each check in Step 2 requires checking all the elements, which is O(n) time.It might be worth checking out the Geometric Distribution in the RZC Probability slides.

? ?

Documents

Topic 1011: Topics in Computer Science