Upload
younger-boxer-metalic
View
212
Download
0
Embed Size (px)
Citation preview
Topic 1011: Topics in Computer Science
Topic 1011: Topics in Computer Science
Dr J Frost ([email protected])
Last modified: 2nd November 2013
A note on these Computer Science slides
These slides are intended to give just an introduction to two key topics in Computer Science: algorithms and data structures. Unlike the other topics in the Riemann Zeta Club, theyre not intended to give a deeper knowledge required for solving difficult problems. The main intention is to provide an initial base of Computer Science knowledge, which may help you in your university interviews.
In addition to these slides, its highly recommended that you study the following Riemann Zeta slides to deal with more specific Computer-Science-ey questions:
Logic
Combinatorics
Pigeonhole Principle
?
Any box with a ? can be clicked to reveal the answer (this works particularly well with interactive whiteboards!).
Make sure youre viewing the slides in slideshow mode.
A: London
B: Paris
C: Madrid
For multiple choice questions (e.g. SMC), click your choice to reveal the answer (try below!)
Question: The capital of Spain is:
Slide Guidance
Contents
Time and Space Complexity
Big O Notation
Sets and Lists
Binary Search
Sorted vs Unsorted Lists
Hash Tables
Recursive Algorithms
Sorting Algorithms
Bubble Sort
Merge Sort
Bogosort
Time and Space Complexity
Suppose we had a list of unordered numbers, which the computer can only view one at a time.
1
4
2
9
3
7
Suppose we want to check if the number 8 is in the list.
If the size of the problem is , (i.e. there are cards in the list), then in the worst case, how much time will it take to check that some number is in there?
And given that the list is stored on a disc (rather than in memory), how much memory (i.e. space) do we need for our algorithm?
(Worst Case) Time Complexity
Time
If theres n items to check, and each takes some constant amount of time to check, so we know the time will be at most some constant times n.
Space Complexity
Space
We only need one slot of memory for the number were checking against the list, and one slot of memory for the current item in the list were looking at. So the space needed will be constant, and importantly, is not dependent on the size n of our list.
?
?
Big O notation
So the time and space complexity of an algorithm gives us a measure of how complex the algorithm is in terms of the time itll take, and the space required to do its handywork.
In mathematics, Big O notation is used to measure how some expression grows.
Suppose for example we have the function:
y = 2x3 + 10x2 + 3
y = 2x3
We can see that as becomes larger, the and terms become inconsequential, because the term dominates.
Since and for all positive , then
Were not interested in the scaling of 15, since this doesnt tell us anything about the growth of the function.
We say that:
i.e. grows cubically.
Big O notation
Formally, if , then there is some constant such that for all sufficiently large . So technically we could say that , because the big-O just provides an upper bound to the growth. But we would want to keep this upper bound as low as possible, so it would be more useful to say that .
1
4
2
9
3
7
While big-O notation has been around for centuries (particularly in number theory), in the 1950s, it started to be used to describe the complexity of algorithms.
Returning to our probably of finding a number in an ordered list, we can now express our time and space complexity using big-O notation (in terms of the list size ):
Time Complexity
Space Complexity
?
?
Remember that the constant scaling doesnt matter in big-O notation. So 1 is used to mean constant time/space.
Big O notation
Time Complexity
We say the time complexity of the algorithm is
constant time
linear time
quadratic time
exponential time
logarithmic time
Well see some examples of more algorithms and their complexity in a second, but lets see how we might describe algorithms based on their complexity
?
?
?
?
polynomial time
?
Sets and lists
A data structure is, unsurprisingly, some way of structuring data, whether as a tree, a set, a list, a table, etc.
Theres two main ways of representing a collection of items: lists and sets.
Lists
Does ordering of items matter?
Yes
Sets
No. and are the same set.
Duplicates allowed?
Yes
No
Example
?
?
?
?
Binary Search
1
3
4
7
9
12
15
20
Suppose we have either a set or list where the items are in ascending order. We want to determine if the number 14 is in the list.
Previously, when the items were unordered, we have to scan through the whole list (a simple algorithm where the time complexity was ).
But can we do better?
More specifically, seeing if an item is within an unsorted list is known as a linear search. (Because we have to check every item, taking time linear in n!)
Binary Search
1
3
4
7
9
12
15
20
This line represents where the number were looking for could possibly be. At the start of a binary search, the number could be anywhere.
A sensible thing to do is to look at the number just after the centre. That way, we can round down our search by half in one step.
In this case , so we know that if the number is in the collection, it must be in the second half of it.
Looking to see if 14 is in our list/set.
Binary Search
1
3
4
7
9
12
15
20
Now we looking halfway across what we have left to check. The number just after the halfway point is 15.
, so if 14 is in our collection, it must be to the left of this point.
Looking to see if 14 is in our list/set.
Binary Search
1
3
4
7
9
12
15
20
Now wed compare our number 14 against the 12.
Now since , we now know that 14 is not in the collection of items.
Looking to see if 14 is in our list/set.
Binary Search
1
3
4
7
9
12
15
20
We can see on each step, we half the number of items that need to be search. The number of steps (i.e. the time complexity) in terms of the number of items must therefore be:
Time Complexity
?
This makes sense when you think about it. If , then , i.e. we can half 16 four times until we get to 1, so only 4 steps are needed.
You might be wondering why we wrote instead of . This is because changing the base of a log only scales by a constant, and as we saw, big-O notation doesnt care about constant scaling. So the base is irrelevant.
Space Complexity
?
We only ever looking at one number at a time, so only need a constant amount of memory.
14
Sorted vs unsorted lists
Keeping our list sorted, or leaving it unsorted, has advantages either way. Weve already seen that keeping the list sorted makes it much quicker to see if the list contains an item or not.
What is the time complexity of the best algorithm to do each of these tasks?
Sorted
Unsorted
Seeing if the list contains a particular value.
Adding an item to the list.
We find the correct position to insert in time using a binary search. If we have some easy to splice in the new item somewhere in the middle of the list, without having to move the items after up to make space, then were done. However, if we do have to move up the items after (e.g. the values are stored in an array), then it takes time to shift the items up, hence its time overall.
We can just stick the item on the end!
Merging two lists (of size and respectively, where )
Start with the largest list, with its items. Then insert each of the items from the second list into it. Each insert operation costs time (from above). But theres items to add.
?
Easy again. Just have the end of the first list somehow link to the start of the first list so that theyre joined together.
?
?
?
Sorted vs unsorted lists
We can see that the advantage of keeping the list unsorted is that its much quicker to insert new items into the list. However, its much slower to find/retrieve an item in the list, because we cant exploit binary search.
So its a trade-off.
Sorted
Unsorted
Seeing if the list contains a particular value.
Adding an item to the list.
Merging two lists (of size and respectively, where )
Hash Table
Hash Tables are structure which allow us to do certain operations to do with collections much more quickly: e.g. inserting a value into the collection, and retrieving!
Imagine we had 10 buckets to put new values into. Suppose we had a rule which decided what bucket to put a value into:
Find the remainder when is divided by 10 (i.e. )
1
2
3
4
5
6
7
8
9
0
Hash Table
1
2
3
4
5
6
7
8
9
0
2
31
67
42
19
112
55
57
29
33
69
We can use our mod 10 hash function to insert new values into our hash table.
Hash Table
1
2
3
4
5
6
7
8
9
0
2
31
67
42
19
112
55
57
29
33
69
The great thing about a hash table is that if we want to check if some value is contained within it, we only need to check within the bucket it corresponds to.
e.g. Is 65 in our hash table?
Using the same hash function, wed just check Bucket 5. At this point, we might just do a linear search of the items in the bucket to see if the 65 matches. In this case, wed conclude that 65 isnt part of our collection of numbers.
Hash Table
1
2
3
4
5
6
7
8
9
0
2
31
67
42
19
112
55
57
29
33
69
Suppose weve put n items in a hash table with k buckets:
Operation
Time Complexity
O(n/k)
But only if our chosen hash function distributes items fairly evenly across buckets. But if our data tended to have 1 as the last digit, mod 10 would be a bad hash function because all the items would end up in the same bucket. The result would be that if we wanted to then check if 71 was in our collection, wed end up having to check every item still! Using mod p where p is a prime, reduces this problem.
O(1)
Presuming the hash function takes a constant amount of time to evaluate, we just stick the new item at the top of the correct bucket. We could always keep the buckets sorted. In which case, insertion would take O(log(n/k)) time.
Seeing if some number is contained in our collection.
Inserting a new item into the hash table structure.
?
?
Recursive Algorithms
The Towers of Hanoi is a classic game in which the aim is to get the tower (composed of varying sized discs) from the first peg to the last peg. Theres one spare peg available.
The only rule is that a larger disc can never be on top of a larger disc, i.e. on any peg the discs must be in decreasing size order from bottom to top.
Theres two questions we might ask:
1. For n discs, what is the minimum number of moves required to win?
2. Is there an algorithm which generates the sequence of moves required?
Recursive Algorithms
We can answer both questions at the same time.
Suppose HANOI(START,SPARE,GOAL,n) is a function which generates a sequence of moves for n discs, where START is the start peg, SPARE is the spare peg and GOAL is the goal peg.
Then we can define an algorithm as such:
Recursive Algorithms
Recursively solve the problem of moving n-1 pegs from the start peg to the spare peg.
i.e. HANOI(START,GOAL,SPARE, n-1)
(notice that weve made the original goal peg the new spare peg and vice versa)
Its quite common to define a function in terms of itself but with smaller arguments. Its recommended you first look at some of the examples in the Recurrence Relations section of the RZC Combinatorics slides to get your head around this.
Recursive Algorithms
Next move the 1 remaining disc (or whatever disc is at the top of the peg) from the start to goal peg.
i.e. MOVE(START,GOAL)
Recursive Algorithms
Finally, recursively solve the problem moving n-1 discs from the spare peg to the target peg.
i.e. HANOI(SPARE,START,GOAL, n-1)
Notice here that the original start peg is now the spare peg, and the spare peg the start peg.
Recursive Algorithms
Putting this together, we have the algorithm:
FUNCTION HANOI(START, SPARE, GOAL, n) =
HANOI(START, GOAL, SPARE, n-1),
MOVE(START, GOAL),
HANOI(SPARE, START, GOAL, n-1)
But just like recurrences in maths, we need a base case, to say what happens when we only have to solve the problem when n=1 (i.e. we have one disc):
FUNCTION HANOI(START, SPARE, GOAL, 1) =
MOVE(START, GOAL)
Recursive Algorithms
We can see this algorithm in action. If the 3 pegs are A, B and C, and we have 3 discs, then we want to execute HANOI(A, B, C, 3) to get our moves:
HANOI(A, B, C, 3)
= HANOI(A, C, B, 2), MOVE(A, C), HANOI(B, A, C, 2)
= HANOI(A, B, C, 1), MOVE(A, B), HANOI(C, B, A, 1), MOVE(A, C),
HANOI(B, C, A, 1), MOVE(B, C), HANOI(A, B, C, 1)
= MOVE(A, C), MOVE(A, B), MOVE(C, A), MOVE(A, C), MOVE(B, A),
MOVE(B, C), MOVE(A, C)
A
B
C
Recursive Algorithms
The same approach applies when counting the minimum number of moves.
Let F(n) be the number of moves required to move n discs to the target peg.
We require F(n-1) moves to move n-1 discs from the start to spare peg.
We require 1 move to move the remaining disc to the goal peg.
We require F(n-1) moves to move n-1 discs from the spare to goal peg.
This gives us the recurrence relation F(n) = 2F(n-1) + 1
And our base case is F(1) = 1, since it only requires 1 move to move 1 disc.
But just writing out the first few terms in this sequence, its easy to spot that the position-to-term formula is F(n) = 2n 1.
?
?
Sorting Algorithms
One very fundamental algorithm in Computer Science is sorting a collection of items so that they are in order (whether in numerical order, or some order weve defined).
Well look at the main well-known algorithms, and look at their time complexity.
2
31
67
42
19
112
55
Bubble Sort
2
31
67
42
19
112
55
At the end of the first pass*, we can guarantee that the largest number will be at the end of the list.
We then repeat the process, but we can now ignore the last number (because its in the correct position). This continues, until eventually on the last pass, we only need to compare the first two items.
* A pass in an algorithm means that weve looked through all the values (or some subset of them) within this stage. You can think of a pass as someone checking your university personal statement and making corrections, before you give this updated draft to another person for an additional pass.
Click to Animate
This looks at each pair of numbers first, starting with the 1st and 2nd, then the 2nd and 3rd , and swaps them if theyre in the wrong order:
Bubble Sort
2
31
67
42
19
112
55
O(n2)
The first pass requires n-1 comparisons, the next pass requires n-2 comparisons, and so on, giving us the sum of an arithmetic sequence.
So the exact number of comparison is n(n-1)
This is growth quadratic in n, i.e. O(n2)
Time Complexity?
?
2
31
67
42
19
112
55
Merge Sort
4
First treat each individual value as an individual list (with 1 item in it!)
Then we repeatedly merge each pair of lists, until we only have 1 big fat list.
2
31
67
42
19
112
55
4
31
42
19
55
2
67
112
4
31
42
19
55
2
67
112
4
Well go into more detail on this merge operation on the next slide.
Merge Sort
31
42
19
55
2
67
112
4
At each point in the algorithm, we know each smaller list will be in order.
Merging two sorted lists can be done quite quickly (click the button below):
Click to Animate
General jist: Start with a marker at the beginning of each list.
Compare the two elements at the markers. The lowest value gets put in the new list, and the marker at that item used moves up one. Then repeat!
New merged list
Merge Sort
O(n log n)
Each merging phase requires exactly n steps because when merging each pair of lists, each comparison puts an element in our new list. So theres exactly n comparisons.
Theres log2 n phases, because similarly to the binary search, each phase halves the number of mini-lists.
Time Complexity?
?
Bogosort
The Bogosort, also known as Stupid Sort, is intentionally a joke sorting algorithm, but provides some educational value. It simply goes like this:
Put all the elements of the list in a completely random order.
Check if the elements are in order. If so, youre done. If not, then go back to Step 1.
We can describe time complexity in different ways: the worst-case behaviour (i.e. the longest amount of time the algorithm can possibly take) and the average-case behaviour (i.e. how long we expect the algorithm to take on average)
Worst Case Time Complexity?
The algorithm theoretically may never terminate, because the order may be wrong every time.
Average Case Time Complexity?
O(n n!)
There are n! possible ways the items can be ordered. Presuming no duplicates in the list, theres a 1 in n! chance that the list is in the correct order. We therefore expect to have to repeat Step 1 n! times.
Each check in Step 2 requires checking all the elements, which is O(n) time.
It might be worth checking out the Geometric Distribution in the RZC Probability slides.
?
?