Upload
ninjatron
View
225
Download
0
Embed Size (px)
Citation preview
8/13/2019 Discrete Math and Probability Theory
1/133
CS 70 Discrete Mathematics and Probability Theory
Fall 2013 Vazirani Note 0
Review of Sets and Mathematical NotationAset is a well defined collection of objects. These objects are called elementsor membersof the set, and
they can be anything, including numbers, letters, people, cities, and even other sets. By convention, sets are
usually denoted by capital letters and can be described or defined by listing its elements and surrounding
the list by curly braces. For example, we can describe the set A to be the set whose members are the first
five prime numbers, or we can explicitly write: A= {2, 3, 5, 7, 11}. Ifx is an element ofA, we writex A.Similarly, ify is not an element ofA, then we writey A. Two sets A and B are said to be equal, written asA=B, if they have the same elements. The order and repetition of elements do not matter, so {red, white,blue} = {blue, white, red} = {red, white, white, blue}. Sometimes, more complicated sets can be defined by
using a different notation. For example, the set of all rational numbers denoted by Q can be written as: {ab
| a, b are integers, b =0}. In English, this is read as the set of all fractions such that the numerator is aninteger and the denominator is a non-zero integer."
Cardinality
We can also talk about the size of a set, or itscardinality. IfA= {1,2,3,4}, then the cardinality ofA, denoted
by |A|, is 4. It is possible for the cardinality of a set to be 0. This set is called the empty set, denoted by thesymbol /0. A set can also have an infinite number of elements, such as the set of all integers, prime numbers,
or odd numbers.
Subsets and Proper Subsets
If every element of a set A is also in set B, then we say thatA is asubsetofB, writtenA B. Equivalently
we can writeB A, or B is a superset ofA. Aproper subsetis a setA that is strictly contained inB, writtenasA B, meaning thatAexcludes at least one element ofB. For example, consider the setB = {1, 2, 3, 4, 5}.Then {1, 2, 3} is both a subset and a proper subset ofB, while {1, 2, 3, 4, 5} is a subset but not a proper subsetofB. Here are a few basic properties regarding subsets:
The empty set is a proper subset of any nonempty setA: /0 A.
The empty set is a subset of every setB: /0 B.
Every setA is a subset of itself: A A.
Intersections and Unions
Theintersectionof a set A with a setB, written asAB, is a set containing all elements which are in bothAand B. Two sets are said to be disjointifA B= /0. Theunionof a set A with a set B, written asA B,is a set of all elements which are in either Aor Bor both. For example, ifA is the set of all positive even
numbers, and B is the set of all positive odd numbers, then AB= /0, and AB= Z+, or the set of allpositive integers. Here are a few properties of intersections and unions:
AB=BA
CS 70, Fall 2013, Note 0 1
8/13/2019 Discrete Math and Probability Theory
2/133
A /0=A
AB=BA
A /0= /0
Complements
IfA and B are two sets, then the relative complement ofA in B, written as BA or B\A, is the set ofelements in B, but not in A: B\A= {x B| x A}. For example, ifB = {1, 2, 3} and A = {3, 4, 5}, thenB\A= {1, 2}. For another example, ifR is the set of real numbers and Q is the set of rational numbers, thenR\Q is the set of irrational numbers. Here are some important properties of complements:
A\A= /0
A\/0=A
/0\A= /0
Significant Sets
In mathematics, some sets are referred to so commonly that they are denoted by special symbols. Some of
these numerical sets include:
N denotes the set of all natural numbers:{0, 1, 2, 3, ...}.
Z denotes the set of all integer numbers:{. . . ,2,1, 0, 1, 2, . . .}.
Q denotes the set of all rational numbers:{ ab| a, b Z,b =0}.
R denotes the set of all real numbers.
C denotes the set of all complex numbers.
In addition, theCartesian product (also called the cross product) of two sets A and B, written as AB,is the set of all pairs whose first component is an element ofA and whose second component is an element
ofB. In set notation,AB={(a, b)| a A, b B}. For example, ifA= {1, 2, 3} and B= {u, v}, thenAB= {(1, u), (1, v), (2, u), (2, v), (3, u), (3, v)}. Given a set S, another significant set is the power setofS,denoted by P(S), is the set of all subsets ofS:{T| T S}. For example, ifS= {1, 2, 3}, then the power setofSis: P(S) = {{},{1},{2},{3},{1, 2},{1, 3},{2, 3},{1, 2, 3}}. It is interesting to note that, if|S| =k,then |P(S)| =2k.
Mathematical notation:
Sums and Products:
There is a compact notation for writing sums or products of large numbers of items. For example, to write
1 + 2 + + n, without having to say dot dot dot, we write it as ni=1 i. More generally we can write the sumf(m) +f(m + 1) + +f(n)as ni=m f(i). Thus,
ni=5 i
2 =52 + 62 + + n2.To write the product f(m)f(m + 1) f(n), we use the notation ni=m f(i). For example,
ni=1 i=1 2 n.
CS 70, Fall 2013, Note 0 2
8/13/2019 Discrete Math and Probability Theory
3/133
Universal and existential quantifiers:
Consider the statement: For all natural numbers n,n2 + n + 41 is prime. Here,n is quantified to any elementof the set N of natural numbers. In notation, we write(n N)(n2 + n + 41 is prime). Here we have used theuniversal quantifier (for all). Is the statement true? If you try to substitute small values ofn, you willnotice thatn2 + n + 41 is indeed prime for those values. But if you think harder, you can find larger valuesofn for which it is not prime. Can you find one? So the statement(n N)(n2 + n + 41 is prime) is false.
Theexistential quantifer (there exists) is used in the following statement:x Z xx
2. y Zx Z y>x
The first statement says that, given an integer, I can find a larger one. The second statement says something
very different: that there is a largest integer! The first statement is true, the second is not.
CS 70, Fall 2013, Note 0 3
8/13/2019 Discrete Math and Probability Theory
4/133
8/13/2019 Discrete Math and Probability Theory
5/133
Inductive Step: Prove that it also holds forn= (k+ 1), i.e.k+1
i=0
i=(k+ 1)(k+ 2)
2 :
k+1
i=0
i= (k
i=0
i) + (k+ 1)
=k(k+ 1)
2 + (k+ 1) (by the inductive hypothesis)
= (k+ 1)(k
2+ 1)
=(k+ 1)(k+ 2)
2 .
Hence, by the principle of induction, the theorem holds.
Lets step back and look at the general form of such a proof, and also why it makes sense. Let us denote
byP(n)the statementn
i=0
i=n(n + 1)
2 . So we wish to prove thatn N,P(n). Theprinciple of induction
asserts that you can prove P(n)is true n N, by following these three steps:
Base Case: Prove thatP(0)is true.
Inductive Hypothesis: Assume that P(k)is true.
Inductive Step: Show that it follows thatP(k+ 1)is true.
To understand why induction works, think of the statements P(n)as represented by a sequence of dominoes,numbered from 0,1,2,...,n, such that P(0) corresponds to the 0th domino, P(1) corresponds to the 1st
domino, and so on. The dominoes are lined up so that if the kth domino is knocked over, then it in turnknocks over the k+ 1st. Knocking over the kth domino corresponds to proving P(k) is true. And theinduction step corresponds to the placement of the dominoes to ensure that if the kth domino falls, in turn
it knocks over the k+ 1st domino. The base case (n= 0) knocks over the 0th domino, setting off a chainreaction that knocks down all the dominoes.
It is worth examining more closely the induction proof example above. To prove P(k+ 1), we find within it
the statementP(k): k+1i=0i = (k
i=0
i) + (k+ 1). This is the key to the induction step.
We will now look at another proof by induction, but first we will introduce some notation and a definition
for divisibility. Given integersa and b, we say thata dividesb (or b is divisible bya), written asa|b, if andonly if for some integerq,b=aq. In mathematical notation, a,b Z,a|biffq Z :b=aq.
CS 70, Fall 2013, Note 1 2
8/13/2019 Discrete Math and Probability Theory
6/133
Theorem: n N,n3nis divisible by 3.
Proof(by induction overn): LetP(n)denote the statement n N,n3nis divisible by 3.
Base Case:P(0)asserts that 3|(030)or 3|0, which is true since non-zero integer divides 0. (In thiscase, 0=3 0).
Inductive Hypothesis: AssumeP(k)is true. That is, 3|(k3 k), or q Z,k3 k=3q.
Inductive Step: We must show that P(k+ 1)is true, which asserts that 3|((k+ 1)3 (k+ 1)). Let usexpand this out:
(k+ 1)3 (k+ 1) =k3 + 3k2 + 3k+ 1 (k+ 1)
= (k3 k) + 3k2 + 3k
=3q + 3(k2 + k), q Z (by the inductive hypothesis)
=3(q + k2 + k)
So 3|((k+ 1)3 (k+ 1)).
Hence, by the principle of induction, n N, 3|(n3n).
There is a clever direct proof without any induction for the above statement. Can you see it?
Two Color Theorem: There is a famous theorem called the four color theorem. It states that any map
can be colored with four colors such that any two adjacent countries (which share a border, but not just
a point) must have different colors. The four color theorem is very difficult to prove, and several bogus
proofs were claimed since the problem was first posed in 1852. It was not until 1976 that the theorem was
finally proved (with the aid of a computer) by Appel and Haken. (For an interesting history of the problem,
and a state-of-the-art proof, which is nonetheless still very challenging, see www.math.gatech.edu/$\sim$thomas/FC/fourcolor.html). We consider a simpler scenario, where we divide the plane
into regions by drawing lines, where each line divides the plane into two regions (i.e. it extends to infinity).
We want to know if we can color this map using no more than two colors (say, red and blue) such that no
two regions that share a boundary have the same color. Here is an example of a two-colored map:
We will prove this two color theorem" by induction on n, the number of lines:
Base Case: Prove thatP(0)is true, which is the proposition that a map withn=0 lines can be can becolored using no more than two colors. But this is easy, since we can just color the entire plane using
one color.
CS 70, Fall 2013, Note 1 3
8/13/2019 Discrete Math and Probability Theory
7/133
Inductive Hypothesis: AssumeP(n). That is, a map with n lines can be two-colored.
Inductive Step: ProveP(n + 1). We are given a map with n + 1 lines and wish to show that it can betwo-colored. Lets see what happens if we remove a line. With only n lines on the plane, we know
we can two-color the map (by the inductive hypothesis). Let us make the following observation: if
we swap red blue, we still have a two-coloring. With this in mind, let us place back the line weremoved, and leave colors on one side of the line unchanged. On the other side of the line, swap red
blue. We claim that this is a valid two-coloring for the map with n + 1 lines.
Why does this work? Any border of a region either consists of a part of one of the original n lines or
a piece of the n + 1-st line. If it is a part of one of the original n lines, then the two regions on eitherside are both on the same side of then + 1-st line, and the colors of the regions must be distinct, by
the induction hypothesis. On the other hand, if the border is part of the n + 1-th line, then the tworegions were created by dividing a single region from the induction hypothesis, and by constructionwe reversed colors on one side of the line, and so they have opposite colors.
Induction is a very powerful technique. But you will need to exercise care while using it, since even small
errors can lead to proving ridiculously false statements. Here is a dramatic example: in the middle of the
last century, a colloquial expression in common use was that is a horse of a different color", referring to
something that is quite different from normal or common expectation. The famous mathematician George
Polya (who was also a great expositor of mathematics for the lay public) gave the following proof to show
that there is no horse of a different color!
Theorem: All horses are the same color.
Proof(by induction on the number of horses):
Base Case: P(1) is certainly true, since if you have a set containing just one horse, all horses in theset have the same color.
Inductive Hypothesis: AssumeP(n), which is the statement that in any set ofn horses, they all havethe same color.
Inductive Step: Given a set ofn + 1 horses {h1,h2, . . . ,hn+1}, we can exclude the last horse in the setand apply the inductive hypothesis just to the first n horses
{h
1, . . . ,h
n}, deducing that they all have
the same color. Similarly, we can conclude that the last n horses {h2, . . . ,hn+1} all have the samecolor. But now the middle horses {h2, . . . ,hn} (i.e., all but the first and the last) belong to both ofthese sets, so they have the same color as horseh1 and horsehn+1. It follows, therefore, that alln + 1horses have the same color. Thus, by the principle of induction, all horses have the same color.
Clearly, it is not true that all horses are of the same color, so where did we go wrong in our induction proof?
It is tempting to blame the induction hypothesis which is clearly false. But the whole point of induction
CS 70, Fall 2013, Note 1 4
8/13/2019 Discrete Math and Probability Theory
8/133
is that if the base case is true (which it is in this case), and assuming the induction hypothesis for any n we
can prove the case n + 1, then the statement is true for all n. So what we are looking for is a flaw in thereasoning!
What makes the flaw in this proof a little tricky to spot is that the induction stepis valid for a typical" value
ofn, say,n =3. The flaw, however, is in the induction step when n =1. In this case, forn + 1=2 horses,there areno middle horses, and so the argument completely breaks down!
Strengthening the Inductive Hypothesis
Let us prove by induction the following proposition:
Theorem: n 1, the sum of the firstn odd numbers is a perfect square.
Proof: By induction on n.
Base Case:n=1. The first odd number is 1, which is a perfect square.
Inductive Hypothesis: Assume that the sum of the firstkodd numbers is a perfect square, saym2.
Inductive Step: Thek+ 1-th odd number is 2k+ 1, so by the induction hypothesis, the sum of the firstk+ 1 odd numbers ism2 + 2k+ 1. But now we are stuck. Why shouldm2 + 2k+ 1 be a perfect square?
Well, lets just take a detour and compute the values of the first few cases. Maybe we will identify another
pattern.
n=1 : 1=12 is a perfect square.
n=2 : 1 + 3=4=22 is a perfect square.
n=3 : 1 + 3 + 5=9=32 is a perfect square.
n=4 : 1 + 3 + 5 + 7=16=42 is a perfect square.
Wait, isnt there a pattern where the sum of the first n odd numbers is just n2? Here is an idea: let us show
something stronger!
Theorem: For alln 1, the sum of the firstn odd numbers isn2.
Proof: By induction on n.
Base Case:n=1. The first odd number is 1, which is 12.
Inductive Hypothesis: Assume that the sum of the firstkodd numbers isk2.
CS 70, Fall 2013, Note 1 5
8/13/2019 Discrete Math and Probability Theory
9/133
Inductive Step: The (k+ 1)-st odd number is 2k+ 1, so by the induction hypothesis the sum of thefirst k+ 1 odd numbers is k2 + (2k+ 1) = (k+ 1)2. Thus by the principle of induction the theoremholds.
See if you can understand what happened here. We could not prove a proposition, so we proved a harder
proposition instead! Can you see why that can sometimes be easier when you are doing a proof by induction?
When you are trying to prove a stronger statement by induction, you have to show something harder in theinduction step, but you also get to assume something stronger in the induction hypothesis. Sometimes the
stronger assumption helps you reach just that much further...
Here is another example:
Imagine that we are given L-shaped tiles (i.e., a 22 square tile with a missing 11 square), and we wantto know if we can tile a 2n2n courtyard with a missing 11 square in the middle. Here is an example ofa successful tiling in the case thatn=2:
Let us try to prove the proposition by induction onn.
Base Case: ProveP(1). This is the proposition that a 22 courtyard can be tiled with L-shaped tileswith a missing 11 square in the middle. But this is easy:
Inductive Hypothesis: AssumeP(n) is true. That is, we can tile a 2n 2n courtyard with a missing11 square in the middle.
Inductive Step: We want to show that we can tile a 2n+12n+1 courtyard with a missing 11 square inthe middle. Lets try to reduce this problem so we can apply our inductive hypothesis. A 2n+12n+1
courtyard can be broken up into four smaller courtyards of size 2n 2n, each with a missing 1 1square as follows:
But the holes are not in the middle of each 2n 2n courtyard, so the inductive hypothesis does nothelp! How should we proceed? We should strengthen our inductive hypothesis!
What we are about to do is completely counter-intuitive. Its like attempting to lift 100 pounds, failing, and
then saying I couldnt lift 100 pounds. Let me try to lift 200," and then succeeding! Instead of proving that
CS 70, Fall 2013, Note 1 6
8/13/2019 Discrete Math and Probability Theory
10/133
we can tile a 2n2n courtyard with a hole in the middle, we will try to prove something stronger: that wecan tile the courtyard with the hole beinganywhere we choose. It is a trade-off: we have to prove more, but
we also get to assume a stronger hypothesis. The base case is the same, so we will just work on the inductive
hypothesis and step.
Inductive Hypothesis (second attempt): AssumeP(n) is true, so that we can tile a 2n2n courtyard
with a missing 11 square anywhere.
Inductive Step (second attempt): As before, we can break up the 2n+12n+1 courtyard as follows.
By placing the first tile as shown, we get four 2 n 2n courtyards, each with a 1 1 hole; three ofthese courtyards have the hole in one corner, while the fourth has the hole in a position determined by
the hole in the 2n
+12n
+1 courtyard. The stronger inductive hypothesis now applies to each of thesefour courtyards, so that each of them can be successfully tiled. Thus, we have proven that we can tile
a 2n+12n+1 courtyard with a hole anywhere! Hence, by the induction principle, we have proved the(stronger) theorem.
Strong Induction
Strong induction is very similar to simple induction, with the exception of the inductive hypothesis. With
strong induction, instead of just assumingP(k) is true, you assume the stronger statement thatP(0), P(1),. . . , andP(k)are all true (i.e., P(0)P(1) P(k)is true, or in more compact notation
ki=0 P(i)is true).
Strong induction sometimes makes the proof of the inductive step much easier since we get to assume astronger statement, as illustrated in the next example.
Theorem: Every natural numbern>1 can be written as a product of primes.
Recall that a numbern 2 is prime if 1 and n are its only divisors. Let P(n)be the proposition thatn can bewritten as a product of primes. We will prove thatP(n)is true for all n 2.
Base Case: We start atn=2. ClearlyP(2)holds, since 2 is a prime number.
Inductive Hypothesis: AssumeP(k) is true for 2 k n: i.e., every numberk: 2 k n can bewritten as a product of primes.
Inductive Step: We must show thatn + 1 can be written as a product of primes. We have two cases:either n + 1 is a prime number, or it is not. For the first case, ifn + 1 is a prime number, then weare done. For the second case, ifn + 1 is not a prime number, then by definition n + 1=xy, wherex,y Z+ and 1
8/13/2019 Discrete Math and Probability Theory
11/133
Why does this proof fail if we were to use simple induction? If we only assumeP(n)is true, then we cannotapply our inductive hypothesis to x and y. For example, if we were trying to proveP(42), we might write42=67, and then it is useful to know that P(6) and P(7)are true. However, with simple induction, wecould only assume P(41), i.e., that 41 can be written as a product of primes a fact that is not useful inestablishing P(42).
To understand why strong induction works, lets think about our domino analogy. By the time we ready
for thek+ 1-st domino to fall, dominoes numbered 0 through khave already been knocked over. But thisis exactly what strong induction assumes: to proveP(k+ 1), we can assume we already know that P(0)throughP(k)are true.
Simple Induction vs. Strong Induction
We have seen that strong induction makes certain proofs easy when simple induction seems to fail. A natural
question to ask then, is whether the strong induction axiom is logically stronger than the simple induction
axiom. In fact, the two methods of induction are logically equivalent. Clearly anything that can be proven by
simple induction can also be proven by strong induction (convince yourself of this!). For the other direction,
suppose we can prove by strong induction that n P(n). LetQ(k) =P(0) P(k). Let us prove k Q(k)by simple induction. The proof is modeled after the strong induction proof ofn P(n). That is, we wantto show Q(k) Q(k+ 1), or equivalently P(0) P(k) P(0) P(k)P(k+ 1). But this istrue iffP(0) P(k) P(k+ 1). This is exactly what the strong induction proof ofn P(n)establishes!Therefore, we can establish n Q(n)by simple induction. And clearly, proving n Q(n)also proves n P(n).
Well Ordering Principle
In the context of proving statement about algorithms or programs, it is often convenient to formulate an
induction proof in a different way. We start by asking how the statementn N, P(n) could fail? Well,it means that there must be some values ofn for which P(n) is false. Letm be the smallest such naturalnumber. We know that m must be greater than 0 since P(0)is true (base case), which indicates m1 N.Sincem is the smallest input that makesP(m)false,P(m1)must be true. ButP(m1) P(m), which isa contradiction.
We assumed something when defining m that is usually taken for granted: that we can actually find a smallest
number in any subset of natural numbers. This property does nothold for, say, the real numbers; to see why,
consider the set{x R : 0
8/13/2019 Discrete Math and Probability Theory
12/133
Lets look at an example.
Round robin tournament:Suppose that, in a round robin tournament, we have a set ofkplayers {p1,p2, . . . ,pk}such that p1 beats p2, p2 beats p3, . . . , pk1 beats pk, and pkbeats p1. This is called a cyclein the tourna-
ment:
(A round robin tournament is a tournament where each participant plays every other contestant exactly once.
Thus, if there aren players, there will be exactly n(n1)
2 matches. Also, we are assuming that every match
ends in either a win or a loss; no ties.)
Claim: If there exists a cycle in a tournament, then there exists a cycle of length 3.
Proof: For the base case, notice that we cannot have a cycle of length less than 3, and if there is a cycle of
length 3 then the proposition is true.
Assume for contradiction that the smallest cycle is:
withn>3. Let us look at the game between p1 and p3. We have two cases: either p3 beats p1, or p1 beatsp3. In the first case (where p3 beats p1), then we are done because we have a 3-cycle. In the second case
(where p1 beats p3), we have a shorter cycle {p3,p4, . . . ,pn} and thus a contradiction. Therefore, if thereexists a cycle, then there must exist a 3-cycle as well.
Can we prove this claim using more traditional induction? Let us start with the base case ofn =3 playersand proceed from there.
Proof: By induction on n.
Base Case: As above.
Inductive Hypothesis: If a round-robin tournament has a cycle of lengthkthen it has a cycle of length
3.
CS 70, Fall 2013, Note 1 9
8/13/2019 Discrete Math and Probability Theory
13/133
Inductive Step: Given a round-robin tournament with a cycle of lengthk+ 1, we wish to show theremust be a 3-cycle. Assume wlog that the cycle involves playersp1throughpk+1in that order. Consider
the outcome of the match between p1 and p3. If p3 beats p1 then we have a 3-cycle. If p1 beats p3,
there is a k-cycle that goes directly from p(1) p(3)and continues as before. Applying the inductionhypothesis, we conclude that there must be a 3-cycle in the tournament.
Induction and Recursion
There is an intimate connection between induction and recursion in mathematics and computer science. A
recursive definition of a function over the natural numbers specifies the value of the function at small values
ofn, and defines the value of f(n)for a general n in terms of the value of f(m)for m
8/13/2019 Discrete Math and Probability Theory
14/133
Can you figure out how long this program takes to computeF(n)? This is a very inefficient way to computethe n-th Fibonacci number. A much faster way is to turn this into an iterative algorithm (this should be a
familiar example of turning a tail-recursion into an iterative algorithm):
function F2(n)
if n=0 then return 0
if n=1 then return 1a = 1
b = 0
f o r k = 2 t o n d o
temp = a
a = a + b
b = temp
return a
Can you show by induction that this new function F2(n) =F(n)? How long does this program take tocomputeF(n)?
Clearly, induction and recursion are closely related. In fact, proofs involving a recursively-defined concept,
e.g. factorial, are often best done using induction. Formally, the factorial of a nonnegative number n is
defined recursively as n!=n(n1)(n2)...1, with a base case 0!=1, whereas exponentiation is definedrecursively asxn =xn1x. In this next example, we will look at is an inequality between two functions ofn.Such inequalities are useful in computer science when showing that one algorithm is more efficient than
another.
Notice that for this example, we have chosen as our base case n =2 rather thann =0. This is because thestatement is trivially true forn1= n!
8/13/2019 Discrete Math and Probability Theory
15/133
Practice Problems1. Prove for any natural numbern that 12 + 22 + 32 + . . .+ n2 = 1
6n(n + 1)(2n + 1).
2. Prove that 3n >2n for all natural numbersn 1.
3. In real analysis, Bernoullis Inequality is an inequality which approximates the exponentiations of
1 +x. Prove this inequality, which states that (1 +x)n 1 + nxifn is a natural number and 1 +x>0.
CS 70, Fall 2013, Note 1 12
8/13/2019 Discrete Math and Probability Theory
16/133
CS 70 Discrete Mathematics and Probability Theory
Fall 2013 Vazirani Note 2
The Stable Marriage Problem: Induction Proofs in Algorithms
A dating agency must match up n men and n women. Each man has an ordered preference listof the n
women, and each woman has a similar list of then men. Is there a good algorithm to pair them up?
Consider for example n=3 men represented by numbers 1, 2, and 3 and 3 women A, B, andC, with thefollowing preference lists:
Men Women1 A B C
2 B A C
3 A B C
Women MenA 2 1 3
B 1 2 3
C 1 2 3
There are many possible pairings for this example, two of which are {(1,A), (2,B), (3,C)} and {(1,B), (2,C),
(3,A)}. How do we decide which pairing to choose? Let us look at an algorithm for this problem that is
simple, fast, and widely-used.
The Stable Marriage Algorithm 1
Every Morning:Each man goes to the first woman on his list not yet crossed off and proposes to her.
Every Afternoon:Each woman says maybe, come back tomorrow to the man she likes best among
the proposals (she now has him on a string) and never to all the rest.
Every Evening:Each rejected suitor crosses off the woman who rejected him from his list.
The above loop is repeated each successive day until there are no more rejected suitors. On this day,
each woman marries the man she has on a string.
How is this algorithm used in the real world?
1This algorithm is based on a 1950s model of dating where the men propose to the women, and the women accept or reject
these proposals
CS 70, Fall 2013, Note 2 1
8/13/2019 Discrete Math and Probability Theory
17/133
The Residency Match
Perhaps the most well-known application of the Stable Marriage Algorithm is the residency match program,
which pairs medical school graduates and residency slots (internships) at teaching hospitals. Graduates
and hospitals submit their ordered preference lists, and the stable pairing produced by a computer matches
students with residency programs.
The road to the residency match program was long and twisted. Medical residency programs were first
introduced about a century ago. Since interns offered a source of cheap labor for hospitals, soon the number
of residency slots exceeded the number of medical graduates, resulting in fierce competition. Hospitals tried
to outdo each other by making their residency offers earlier and earlier. By the mid-40s, offers for residency
were being made by the beginning of junior year of medical school, and some hospitals were contemplating
even earlier offers to sophomores! The American Medical Association finally stepped in and prohibited
medical schools from releasing student transcripts and reference letters until their senior year. This sparked
a new problem, with hospitals now making short fuse" offers to make sure that if their offer was rejected
they could still find alternate interns to fill the slot. Once again the competition between hospitals led to an
unacceptable situation, with students being given only a few hours to decide whether they would accept an
offer.
Finally, in the early 50s, this unsustainable situation led to the centralized system called the National Res-
idency Matching Program (N.R.M.P.) in which the hospitals ranked the residents and the residents ranked
the hospitals. The N.R.M.P. then produced a pairing between the applicants and the hospitals, though at
first this pairing was not stable. It was not until 1952 that the N.R.M.P. switched to the Stable Marriage
Algorithm, resulting in a stable pairing.
Most recently, Lloyd Shapley and Alvin Roth won the Nobel Prize in Economic Sciences 2012, by extending
the Stable Marriage Algorithm we study in this lecture!2
Properties of the Stable Marriage AlgorithmWe wish to show that the stable marriage algorithm is fast and finds a good pairing. But first, we must show
that it halts. Here is a simple argument: on each day that the algorithm does not halt, at least one man must
eliminate some woman from his list (otherwise the halting condition for the algorithm would be invoked).
Since each list has n elements, and there are n lists, this means that the algorithm must terminate in at most
n2 days. Next, we need to show that the Stable Marriage Algorithm finds a good pairing. Before we do this,
we should discuss what we consider to be a good pairing.
Stability
What properties should a good pairing have? One possible criterion for a good pairing is one in which the
number of first ranked choices is maximized. Another possibility is to minimize the number of last ranked
choices. Or perhaps minimizing the sum of the ranks of the choices, which may be thought of as maximizing
the average happiness.
2See http://www.nobelprize.org/nobel_prizes/economic-sciences/laureates/2012/
popular-economicsciences2012.pdffor more details
CS 70, Fall 2013, Note 2 2
8/13/2019 Discrete Math and Probability Theory
18/133
In this lecture we will focus on a very basic criterion: stability. A pairing is unstable if there is a man and a
woman who prefer each other to their current partners. We will call such a pair a rogue couple. So a pairing
ofn men andn women is stable if it has no rogue couples.
An unstable pairing from the example given in the beginning is: {(1,C), (2,B), (3,A)}. The reason is that 1
andB form a rogue couple, since 1 would rather be with B thanC(his current partner), and since B would
rather be with 1 than 2 (her current partner).
An example of a stable pairing is: {(2,A), (1,B), (3,C)}. Note that(1,A)is not a rogue couple. It is true thatman 1 would rather be with woman A than his current partner. Unfortunately for him, she would rather be
with her current partner than with him. Note also that both 3 and Care paired with their least favorite choice
in this matching. Nonetheless, it is a stable pairing, since none of their preferred choices would rather be
with them.
Before we discuss how to find a stable pairing, let us ask a more basic question: do stable pairings always
exist? Surely the answer is yes, since we could start with any pairing and make it more and more stable as
follows: if there is a rogue couple, modify the current pairing so that they are together. Repeat. Surely this
procedure must result in a stable pairing! Unfortunately this reasoning is not sound. To demonstrate this,
let us consider a slightly different scenario, the roommates problem. Here we have 2npeople who must be
paired up to be roommates (the difference being that unlike the dating scenario, a person can be paired with
any of the remaining 2n1). The point is that nothing about the intuition about the progress made by thestable marriage algorithm relied on the fact that men can only be paired with women, so the same intuition
should apply to the roommates problem as well. The following counter-example illustrates the fallacy in the
reasoning:
CS 70, Fall 2013, Note 2 3
8/13/2019 Discrete Math and Probability Theory
19/133
8/13/2019 Discrete Math and Probability Theory
20/133
of the Stable Marriage algorithm. We will use the preference lists given earlier, which are duplicated here
for convenience:
Men Women
1 A B C
2 B A C
3 A B C
Women Men
A 2 1 3
B 1 2 3
C 1 2 3
The following table shows which men propose to which women on the given day (the circled men are the
ones who were told maybe):
Thus, the stable pairing which the algorithm outputs is: {(1,A), (2,B), (3,C)}.
Theorem: The pairing produced by the algorithm is always stable.
Proof: We will show that no man M can be involved in a rogue couple. Consider any couple (M,W) in
the pairing and suppose that M prefers some woman W* to W. We will argue that W* prefers her partner
to M, so that (M,W*) cannot be a rogue couple. Since W* occurs before W in Ms list, he must have
proposed to her before he proposed to W. Therefore, according to the algorithm, W* must have rejected him
for somebody she prefers. By the Improvement Lemma, W* likes her final partner at least as much, andtherefore prefers him to M. Thus no man M can be involved in a rogue couple, and the pairing is stable.
CS 70, Fall 2013, Note 2 5
8/13/2019 Discrete Math and Probability Theory
21/133
Optimality
Consider the situation in which there are 4 men and 4 women with the following preference lists:
Men Women
1 A B C D
2 A D C B
3 A C B D
4 A B C D
Women Men
A 1 3 2 4
B 4 3 2 1
C 2 3 1 4
D 3 4 2 1
For these preference lists, there are exactly two stable pairings: S= {(1,A), (2,D), (3,C), (4,B)} and T={(1,A), (2,C), (3,D), (4,B)}. The fact that there is more than one stable pairing brings up an interesting
question. What is the best possible partner for each person, say man 2 for example?
The trivial answer is his first choice (i.e., woman A), but that is just not a realistic possibility for him. Pairing
man 2 with woman A would simply not be stable, since he is so low on her preference list. And indeed there
is no stable pairing in which 2 is paired with A. Examining the two stable pairings, we can see that the best
possible realistic outcome for man 2 is to be matched to woman D.
Let us make some definitions to better express these ideas: we say the optimal woman for a man is the
highest woman on his list whom he is paired with in any stablepairing. In other words, the optimal woman
is the best that a man can do under the condition of stability. In the above example, woman D is the optimal
woman for man 2. So the best that each man can hope for is to be paired with his optimal woman. But can
they achieve this optimality simultaneously? I.e., is there a stable pairing such that each man is paired with
his optimal woman? If such a pairing exists, we will call it amale optimalpairing. Turning to the example
above, Sis a male optimal pairing since A is 1s optimal woman, D is 2s optimal woman, C is 3s optimal
woman, and B is 4s optimal woman. Similarly, we can define a female optimal pairing, which is the pairing
in which each woman is paired with her optimal man. [Exercise: Check that T is a female optimal pairing.]
We can also go in the opposite direction and define the pessimalwoman for a man to be the lowest ranked
woman whom he is ever paired with in some stable pairing. This leads naturally to the notion of a malepessimalpairing can you define it, and also a female pessimal pairing?
Now, a natural question to ask is: Who is better off in the Stable Marriage Algorithm: men or women?
Think about this before you read on...
Theorem: The pairing output by the Stable Marriage algorithm is male optimal.
Proof: Suppose for the sake of contradiction that the pairing is notmale optimal. Assume the first day when
a man got rejected by his optimal woman was day k. On this day,Mwas rejected byW (his optimal mate)
in favor ofM who proposed to her. By definition of optimal woman, there must be exist a stable pairing
Tin which Mand W are paired together. Suppose Tlooks like this: {. . . , (M,W), . . . , (M,W), . . .}. We
will argue that(M,W)is a rogue couple in T, thus contradicting stability.
First, it is clear that W prefersM to M, since she rejected Min favor ofM during the execution of the
stable marriage algorithm. Moreover, since day kwas the first day when some man got rejected by his
optimal woman, on dayk M had not yet been rejected by his optimal woman. Since he proposed to W on
thek-th day, this implies that M likesW at least as much as his optimal woman, and therefore at least as
much asW. Therefore,(M,W) form a rogue couple in T, and so T is not stable. Contradiction. Thus,our assumption was wrong and the pairing is male optimal.
CS 70, Fall 2013, Note 2 6
8/13/2019 Discrete Math and Probability Theory
22/133
What proof techniques did we use to prove this theorem? Again it is a proof by induction, structured as an
application of the well-ordering principle. How do we see it as a regular induction proof? This is a bit subtle
to figure out. See if you can do so before reading on. As a hint, the proof is really showing by induction onk
the following statement: for every k, no man gets rejected by his optimal woman on the kth day. [Exercise:
can you complete the induction?]
So men appear to fare very well by following this algorithm. How about the women? The following theorem
confirms the sad truth:
Theorem: If a pairing is male optimal, then it is also female pessimal.
Proof: Let T = {. . . , (M, W), . . . } be the male optimal pairing (which we know is output by the algorithm).
Suppose for the sake of contradiction that there exists a stable pairing S = {. . . , (M*, W), . . . , (M, W), . . . }
such that M* is lower on Ws list than M (i.e., M is not her pessimal man). We will argue that S cannot
possibly be stable by showing that (M,W) is a rogue couple in S. By assumption, W prefers M to M* since
M* is lower on her list. And M prefers W to his partner W in S because W is his partner in the male optimal
pairing T. Contradiction. Therefore, the male optimal pairing is female pessimal.
All this seems a bit unfair to the women! Are there any lessons to be learned from this? Make the first move!
Back to the National Residency Matching Program, until recently the algorithm was run with the hospitals
doing the proposing, and so the pairings produced were hospital optimal. In the nineties, the roles were
reversed so that the medical students were proposing to the hospitals. More recently, there were other
improvements made to the algorithm which the N.R.M.P. used. For example, the pairing takes into account
preferences for married students for positions at the same or nearby hospitals.
Further reading (optional!)
Though it was in use 10 years earlier, the propose-and-reject algorithm was first properly analyzed by Galeand Shapley, in a famous paper dating back to 1962 that still stands as one of the great achievements in the
analysis of algorithms. The full reference is:
D. Gale and L.S. Shapley, College Admissions and the Stability of Marriage, American Mathematical
Monthly69 (1962), pp. 914.
Stable marriage and its numerous variants remains an active topic of research in computer science. Although
it is by now twenty years old, the following very readable book covers many of the interesting developments
since Gale and Shapleys algorithm:
D. Gusfield and R.W. Irving, The Stable Marriage Problem: Structure and Algorithms, MIT Press, 1989.
CS 70, Fall 2013, Note 2 7
8/13/2019 Discrete Math and Probability Theory
23/133
CS 70 Discrete Mathematics and Probability Theory
Fall 2013 Vazirani Note 3
Modular ArithmeticIn several settings, such as error-correcting codes and cryptography, we sometimes wish to work over a
smaller range of numbers. Modular arithmetic is useful in these settings, since it limits numbers to a prede-
fined range{0,1, . . . ,N 1}, and wraps around whenever you try to leave this range like the hand of aclock (whereN= 12) or the days of the week (where N= 7).
Example: Calculating the time When you calculate the time, you automatically use modular arithmetic.
For example, if you are asked what time it will be 13 hours from 1 pm, you say 2 am rather than 14.
Lets assume our clock displays 12 as 0. This is limiting numbers to a predefined range,{0,1,2, . . . ,11}.Whenever you add two numbers in this setting, you divide by 12 and provide the remainder as the answer.
If we wanted to know what the time would be 24 hours from 2 pm, the answer is easy. It would be 2 pm.
This is true not just for 24 hours, but for any multiple of 12 hours. What about 25 hours from 2 pm? Since
the time 24 hours from 2 pm is still 2 pm, 25 hours later it would be 3 pm. Another way to say this is that
we add 1 hour, which is the remainder when we divide 25 by 12.
This example shows that under certain circumstances it makes sense to do arithmetic within the confines
of a particular number (12 in this example). That is, we only keep track of the remainder when we divide
by 12, and when we need to add two numbers, instead we just add the remainders. This method is quite
efficient in the sense of keeping intermediate values as small as possible, and we shall see in later notes how
useful it can be.
More generally we can define x mod m (in words x modulom) to be the remainderrwhen we dividex by
m. i.e. ifx mod m=r, thenx =mq + rwhere 0r m1 andq is an integer. Thus 5=29 mod 12 and3=13 mod 5.
ComputationIf we wish to calculatex +ymod m, we would first addx +yand the calculate the remainder when we dividethe result bym. For example, ifx=14 and y=25 andm=12, we would compute the remainder when wedividex +y=14 + 25=39 by 12, to get the answer 3. Notice that we would get the same answer if we firstcomputed 2= x mod 12 and 1=y mod 12 and added the results modulo 12 to get 3. The same holds forsubtraction: xymod 12 is11 mod 12, which is 1. Again, we could have directly obtained this as 2 1by first simplifyingx mod 12 andy mod 12.
This is even more convenient if we are trying to multiply: to compute xymod 12, we could first compute
xy=1425=350 and then compute the remainder when we divide by 12, which is 2. Notice that we getthe same answer if we first compute 2 =xmod 12 and 1 =ymod 12 and simply multiply the results modulo12.
More generally, while carrying out any sequence of additions, subtractions or multiplications modm, we
get the same answer even if we reduce any intermediate results mod m. This can considerably simplify the
CS 70, Fall 2013, Note 3 1
8/13/2019 Discrete Math and Probability Theory
24/133
calculations.
Set Representation
There is an alternate view of modular arithmetic which helps understand all this better. For any integerm
we say thatx and y are congruent modulo m if they differ by a multiple ofm, or in symbols,
x=y mod m mdivides(xy).
Note that you may also see this written as x y mod m. For example, 29 and 5 are congruent modulo 12because 12 divides 295. We can also write 22=2 mod 12. Notice thatx andy are congruent modulomiff they have the same remainder modulo m.
What is the set of numbers that are congruent to 0 mod 12? These are all the multiples of 12:
{. . . ,36,24,12,0,12,24,36, . . .}. What about the set of numbers that are congruent to 1 mod 12?These are all the numbers that give a remainder 1 when divided by 12:{. . . ,35,23,11,1,13,25,37, . . .}.Similarly the set of numbers congruent to 2 mod 12 is{. . . ,34,22,10,2,14,26,38, . . .}. Notice in thisway we get 12 such sets of integers, and every integer belongs to one and only one of these sets.
In general if we work modulom, then we getm such disjoint sets whose union is the set of all integers. We
can think of each set as represented by the unique element it contains in the range (0, . . . ,m 1). The setrepresented by elementi would be all numbersz such that z=mx + ifor some integerx. Observe that all ofthese numbers have remainder i when divided bym; they are therefore congruent modulo m.
We can understand the operations of addition, subtraction and multiplication in terms of these sets. When
we add two numbers, sayx = 2 mod 12 andy = 1 mod 12, it does not matter whichxandywe pick from thetwo sets, since the result is always an element of the set that contains 3. The same is true about subtraction
and multiplication. It should now be clear that the elements of each set are interchangeable when computing
modulom, and this is why we can reduce any intermediate results modulom.
Here is a more formal way of stating this observation:
Theorem 3.1: Ifa=c mod m and b=dmod m, thena + b=c + dmod m and a b=c dmodm.
Proof: We know that c =a + km and d= b + m, so c + d= a + km + b + m=a + b + (k+ ) m,which means thata + b=c + dmod m. The proof for multiplication is similar and left as an exercise.
What this theorem tells us is that we can always reduce any arithmetic expression modulo m into a natural
number smaller than m. As an example, consider the expresion (13 + 11) 18 mod 7. Using the aboveTheorem several times we can write:
(13 + 11) 18= (6 + 4) 4 mod 7
=10 4 mod 7
=3 4 mod 7
=12 mod 7
=5 mod 7.
In summary, we can always do calculations modulom by reducing intermediate results modulom.
InversesWe have so far discussed addition, multiplication and subtraction. What about division? This is a bit harder.
Over the reals dividing by a number x is the same as multiplying by y = 1/x. Here y is that number such
CS 70, Fall 2013, Note 3 2
8/13/2019 Discrete Math and Probability Theory
25/133
thatx y=1. Of course we have to be careful whenx=0, since such ay does not exist. Similarly, when wewish to divide byx mod m, we need to findy mod m such that x y=1 mod m; then dividing by x modulomwill be the same as multiplying by y modulom. Such ay is called themultiplicative inverseofx modulo
m. In our present setting of modular arithmetic, can we be sure that x has an inverse mod m, and if so, is it
unique (modulom) and can we compute it?
As a first example, take x =8 and m =15. Then 2x=16=1 mod 15, so 2 is a multiplicative inverse of 8
mod 15. As a second example, takex =12 and m=15. Then the sequence{axmod m : a=0,1,2, . . .}isperiodic, and takes on the values{0,12,9,6,3}(check this!). Thus 12has no multiplicative inverse mod15.
So when does x have a multiplicative inverse modulo m? The answer is: iff the greatest common divisor
ofm and x is 1. Moreover, when the inverse exists it is unique. Recall that the greatest common divisorof
two natural numbers x and y, denoted gcd(x,y), is the largest natural number that divides them both. Forexample,gcd(30,24) =6. If gcd(x,y)is 1, it means that x and y share no common factors (except 1). Thisis often expressed by saying thatx and m arerelatively prime.
Theorem 3.2: Let m,x be positive integers such that gcd(m,x) = 1. Then x has a multiplicative inversemodulom, and it is unique (modulo m).
Proof: Consider the sequence ofm numbers 0,x,2x, . . . (m1)x. We claim that these are all distinct mod-
ulo m. Since there are onlym distinct values modulo m, it must then be the case that ax= 1 mod m forexactly onea (modulom). Thisa is the unique multiplicative inverse.
To verify the above claim, suppose thatax=bx mod m for two distinct values a,bin the range 0bam1. Then we would have(ab)x= 0 modm, or equivalently,(ab)x= kmfor some integerk(possiblyzero or negative).
However,x and mare relatively prime, so x cannot share any factors withm. This implies that abmust bean integer multiple ofm. This is not possible, since abranges between 1 andm1.
Actually it turns out that gcd(m,x) = 1 is also a necessarycondition for the existence of an inverse: i.e., ifgcd(m,x)> 1 then x has no multiplicative inverse modulo m. You might like to try to prove this using asimilar idea to that in the above proof.
Since we know that multiplicative inverses are unique when gcd(m,x) =1, we shall write the inverse ofx asx1 modm. Being able to compute the multiplicative inverse of a number is crucial to many applications,
so ideally the algorithm used should be efficient. It turns out that we can use an extended version of Euclids
algorithm, which computes the gcd of two numbers, to compute the multiplicative inverse.
Computing the Multiplicative InverseLet us first discuss how computing the multiplicative inverse ofx modulom is related to finding gcd(x,m).For any pair of numbers x,y, suppose we could not only compute gcd(x,y), but also find integers a,b suchthat
d=gcd(x,y) =ax + by. (1)
(Note that this is not a modular equation; and the integers a,bcould be zero or negative.) For example, wecan write 1=gcd(35,12) =1 35 + 3 12, so herea=1 andb=3 are possible values fora,b.
If we could do this then wed be able to compute inverses, as follows. We first find the integers a andb such
that
1=gcd(m,x) =am + bx.
CS 70, Fall 2013, Note 3 3
8/13/2019 Discrete Math and Probability Theory
26/133
But this means that bx =1 modm, sob is the multiplicative inverse ofx modulom. Reducingb modulomgives us the unique inverse we are looking for. In the above example, we see that 3 is the multiplicative
inverse of 12 mod 35. So, we have reduced the problem of computing inverses to that of finding integers
a,bthat satisfy equation (1). Remarkably, Euclids algorithm for computing gcds also allows us to find theintegers a and b described above. So computing the multiplicative inverse ofx modulo m is as simple as
running Euclids gcd algorithm on inputx and m!
Euclids Algorithm
If we wish to compute the gcd of two numbersxandy, how would we proceed? Ifxoryis 0, then computing
the gcd is easy; it is simply the other number, since 0 is divisible by everything (although of course it divides
nothing). The algorithm for computinggcd(x,y)uses the following theorem to eventually reduce to the casewhere one of the numbers is 0:
Theorem 3.3: Letxy and let q,rbe natural numbers suchx =yq+rand r0.
Lets go through a quick example of this recursive implementation of Euclids algorithm. We wish to
compute gcd(32,10):
gcd(32,10) = gcd(10,2)
= gcd(2,0)
= 2
CS 70, Fall 2013, Note 3 4
8/13/2019 Discrete Math and Probability Theory
27/133
Extended Euclids Algorithm
In order to compute the multiplicative inverse, we need an algorithm which also returns integers a and b
such that:
gcd(x,y) =ax + by.
Now since this problem is a generalization of the basic gcd, it is perhaps not too surprising that we can solveit with a fairly straightforward extension of Euclids algorithm.
Examples
Lets first see how we would compute such numbers for x=6 andy=4. Well need the equations from ourexample above, copied here for reference:
16=101 + 610=61 + 46=41 + 24=22 + 0
From the last two equations it follows that gcd(6,4) =2. But now the second last equation gives us thenumbersa,b, since we just rearrange that equation to say 2=6141. Soa=1 andb=1.
What if we started with x =10 and y =6? Now we would write the last three equations to determine thatgcd(10,6) = 2. But how do we finda,b? Start as above and write 2 = 6141. But we want 10 and 6 onthe right hand side, not 6 and 4. But notice that the third from the last equation allows us to write 4 as a linear
combination of 6 and 10 and so we can just back substitute: we rewrite that equation as 4 =10161and substitute to get:
2=6141=61 (10161) =62101.
If we started with x= 16 and y= 10 we would back substitute again using the first equation rewritten as
6=1610 to get:2=62101= (1610)210=162103. Soa=2 andb=3.
Algorithm
The following recursive algorithmextended-gcdimplements the idea used in the examples above. It takes as
input a pair of natural numbers x y as in Euclids algorithm, and returns a triple of integers (d,a,b)suchthatd=gcd(x,y)and d=ax + by:
algorithm extended-gcd(x,y)
if y = 0 then return(x, 1, 0)
else
(d, a, b) := extended-gcd(y, x mod y)
return((d, b, a - (x div y) * b))
Note that this algorithm has the same form as the basic gcd algorithm we saw earlier; the only difference is
that we now carry around in addition the required values a,b. You should hand-turn the algorithm on theinput(x,y) = (16,10)from our earlier example, and check that it delivers correct values for a,b.
CS 70, Fall 2013, Note 3 5
8/13/2019 Discrete Math and Probability Theory
28/133
Youll see a full analysis of this algorithm in CS 170, including correctness and efficiency (the running
time is O(n3)) . Let us understand intuitively why the numbersa and b returned by the algorithm shouldgive us what we are looking for. We just need to generalize the back substitution method we used in the
example above. The algorithm reduces finding gcd(x,y) to finding gcd(y,xmod y). Once the algorithmfinds gcd(y,xmod y), it returns valuesa and b such that:
d=ay + b(xmod y). (2)
Now we need to update these values ofa and b, say toA and B, such that
d=Ax +By. (3)
To figure out whatA and B should be, we need to rearrange equation (2), as follows:
d= ay + b(xmod y)
=ay + b(xx/yy)
=bx + (ax/yb)y.
(In the second line here, we have used the fact that x mod y =x x/yy check this!) Comparing thislast equation with equation (3), we see that we need to takeA=b and B=ax/yb. This is exactly whatthe algorithm does. Of course we have not fully proved correctness, but you should be able to see why the
algorithm works.
CS 70, Fall 2013, Note 3 6
8/13/2019 Discrete Math and Probability Theory
29/133
CS 70 Discrete Mathematics and Probability Theory
Fall 2013 Vazirani Note 4
This note is partly based on Section 1.4 of Algorithms," by S. Dasgupta, C. Papadimitriou and U. Vazirani,McGraw-Hill, 2007.
An online draft of the book is available at http://www.cs.berkeley.edu/ vazirani/algorithms.html
Public Key Cryptography
Bijections
This note introduces the fundamental concept of a function, as well as a famous function called RSA that
forms the basis of public-key cryptography. A function is a mapping from a set of inputs Ato a set of outputs
B: for inputxA, f(x)must be in the setB. To denote such a function, we write f :AB.
Consider the following examples of functions, where both functions map{0, . . . , m 1}to itself:
f(x) =x + 1 modm
g(x) =2xmod m
A bijection is a function for which every bB has a uniquepre-image aA such that f(a) =b. Note thatthis consists of two conditions:
1. f isonto: everybB has a pre-image aA.
2. f isone-to-one: for alla, a A, if f(a) = f(a)thena=a.
Looking back at our examples, we can see that fis a bijection; the unique pre-image ofy isy 1. However,gis only a bijection ifm is odd. Otherwise, it is neither one-to-one nor onto. The following lemma can be
used to prove that a function is a bijection:
Lemma:For a finite setA, f :A Ais a bijection if there is aninversefunctiong:A Asuch thatx Ag(f(x)) =x.
Proof: If f(x) = f(x), thenx=g(f(x)) =g(f(x)) =x. Therefore, fis one-to-one. Since f is one-to-one,there must be|A|elements in the range of f. This implies that fis also onto.
RSA
One of the most useful bijections is the RSA function, named after its inventors Ronald Rivest, Adi Shamir
and Leonard Adleman:
E(x)xe modN
whereN= pq(pand q are two large primes),E: {0, . . . ,N 1} {0, . . . ,N 1}and e is relatively primeto(p 1)(q 1). The inverse of the RSA function is:
CS 70, Fall 2013, Note 4 1
8/13/2019 Discrete Math and Probability Theory
30/133
D(x)xd modN
wheredis the inverse ofe mod (p 1)(q 1).
Consider the following setting. Alice and Bob wish to communicate confidentially over some (insecure)
link. Eve, an eavesdropper who is listening in, would like to discover what they are saying. Lets assume
that Alice wants to transmit a message x (a number between 1 and N 1) to Bob. She will apply herencryption function E to x and send the encrypted message E(x)(also called the cyphertext) over the link;Bob, upon receipt ofE(x), will then apply hisdecryption function D to it. Since D is the inverse ofE, Bobwill obtain the original messagex. However, how can we be sure that Eve cannot also obtainx?
In order to encrypt the message, Alice only needs Bobspublic key(N, e). In order to decrypt the message,Bob needs hisprivate key d. The pair(N, e)can be thought of as a public lock - anyone can place a messagein a box and lock it, but only Bob has the key dto open the lock. The idea is that since Eve does not have
access tod, she will not be able to gain information about Alices message.
We will now prove that D(E(x)) =x(and thereforeE(x)is a bijection). We will require a beautiful theoremfrom number theory known as Fermats Little Theorem, which is the following:
Theorem 4.1: [Fermats Little Theorem]For any prime pand any a {1, 2, . . . ,p 1}, we haveap1 =1 mod p.
Let Sbe the nonzero integers modulo p; that is, S= {1, 2, . . . ,p 1}. Define a function f :S Ssuchthat f(x)ax mod p. Heres the crucial observation: fis simply a bijection from Sto S; it permutes theelements ofS. For instance, heres a picture of the case a=3,p=7:
6
5
4
3
2
1 1
2
3
4
5
6
Figure 1: Multiplication by (3 mod 7)
With this intuition, we can now prove Fermats Little Theorem:
Proof: Our first claim is that f(x)is a bijection. We will then show that this claim implies the theorem.
To show that f is a bijection, we simply need to argue that the numbers a i mod p are distinct. This isbecause if a i a j (mod p), then dividing both sides by a gives i j (mod p). They are nonzerobecausea i0 similarly implies i0. (And wecan divide bya, because by assumption it is nonzero andtherefore relatively prime to p.)
Now we can prove the theorem. Since f is a bijection, we know that the image of f isS. Now if we take the
product of all elements inS, it is equal to the product of all elements in the image of f:
(p 1)!ap1 (p 1)! (mod p).
CS 70, Fall 2013, Note 4 2
8/13/2019 Discrete Math and Probability Theory
31/133
Dividing by(p 1)! (which we can do because it is relatively prime to p, since p is assumed prime) thengives the theorem.
Let us return to proving that D(E(x)) =x:
Theorem 4.2: Under the above definitions of the encryption and decryption functionsE and D, we have
D(E(x)) =x mod Nfor every possible messagex {0,
1, . . . ,
N 1}.The proof of this theorem relies on Fermats Little Theorem:
Proof of Theorem 6.2:To prove the statement, we have to show that
(xe)d =x mod N for everyx {0, 1, . . . ,N 1}. (1)
Lets consider the exponent, which is ed. By definition ofd, we know thated= 1 mod(p 1)(q 1); hencewe can writeed= 1 + k(p 1)(q 1)for some integer k, and therefore
xedx=x1+k(p1)(q1) x=x(xk(p1)(q1) 1). (2)
Looking back at equation (1), our goal is to show that this last expression in equation (2) is equal to 0 modN
for everyx.Now we claim that the expression x(xk(p1)(q1) 1)in (2) is divisible by p. To see this, we consider twocases:
Case 1: x is not a multiple of p. In this case, sincex = 0 mod p, we can use Fermats Little Theorem to deducethat xp1 =1 mod p. Then(xp1)k(q1) 1k(q1) mod p and hence xk(p1)(q1) 1= 0 mod p, asrequired.
Case 2: x is a multiple of p. In this case the expression in (2), which hasx as a factor, is clearly divisible by p.
By an entirely symmetrical argument, x(xk(p1)(q1) 1)is also divisible by q. Therefore, it is divisible byboth p and q, and since p and q are primes it must be divisible by their product, pq=N. But this implies
that the expression is equal to 0 modN, which is exactly what we wanted to prove.
So we have seen that the RSA protocol iscorrect, in the sense that Alice can encrypt messages in such a way
that Bob can reliably decrypt them again. But how do we know that it is secure, i.e., that Eve cannot get any
useful information by observing the encrypted messages? The security of RSA hinges upon the following
simple assumption:
Given N,e and y=xe modN, there is no efficient algorithm for determiningx.
This assumption is quite plausible. How might Eve try to guess x? She could experiment with all possible
values ofx, each time checking whether xe =y mod N; but she would have to try on the order ofNvalues
of x, which is completely unrealistic if N is a number with (say) 512 bits. This is becauseN2512
islarger than estimates for the age of the Universe in femtoseconds! Alternatively, she could try to factor Nto
retrieve pand q, and then figure out dby computing the inverse ofe mod(p 1)(q 1); but this approachrequires Eve to be able to factor Ninto its prime factors, a problem which is believed to be impossible to
solve efficiently for large values ofN. She could try to compute the quantity(p 1)(q 1)without factoringN; but it is possible to show that computing (p 1)(q 1) is equivalent to factoring N. We should pointout that the security of RSA has not been formally proved: it rests on the assumptions that breaking RSA is
essentially tantamount to factoring N, and that factoring is hard.
CS 70, Fall 2013, Note 4 3
8/13/2019 Discrete Math and Probability Theory
32/133
We close this note with a brief discussion of implementation issues for RSA. Since we have argued that
breaking RSA is impossible becausefactoringwould take a very long time, we should check that the com-
putations that Alice and Bob themselves have to perform are much simpler, and can be done efficiently.
There are really only two non-trivial things that Alice and Bob have to do:
1. Bob has to find prime numberspand q, each having many (say, 512) bits.
2. Both Alice and Bob have to compute exponentials modN. (Alice has to computexe modN, and Bob
has to computeyd modN.)
Both of these tasks can be carried out efficiently. The first requires the implementation of an efficient test
for primality as well as a rich source of primes. You will learn how to tackle each of these tasks in the
algorithms courseCS170. The second requires an efficient algorithm for modular exponentiation, which is
not very difficult, but will also be discussed in detail in CS170.
To summarize, then, in the RSA protocol Bob need only perform simple calculations such as multiplication,
exponentiation and primality testing to implement his digital lock. Similarly, Alice and Bob need only
perform simple calculations to lock and unlock the the message respectivelyoperations that any pocket
computing device could handle. By contrast, to unlock the message without the key, Eve would haveto perform operations like factoring large numbers, which (at least according to widely accepted belief)
requires more computational power than all the worlds most sophisticated computers combined! This
compelling guarantee of security without the need for private keys explains why the RSA cryptosystem is
such a revolutionary development in cryptography.
CS 70, Fall 2013, Note 4 4
8/13/2019 Discrete Math and Probability Theory
33/133
CS 70 Discrete Mathematics and Probability Theory
Fall 2013 Vazirani Note 5
PolynomialsPolynomials constitute a rich class of functions which are both easy to describe and widely applicable in
topics ranging from Fourier analysis to computational geometry. In this note, we will discuss properties of
polynomials which make them so useful. We will then describe how to take advantage of these properties to
develop a secret sharing scheme.
Recall from your high school math that a polynomial in a single variable is of the form p(x) =adxd +
ad1xd1 +. . .+ a0. Here the variable x and the coefficients ai are usually real numbers. For example,
p(x) =5x3 + 2x + 1, is a polynomial ofdegree d=3. Its coefficients area3=5,a2=0,a1=2, anda0=1.Polynomials have some remarkably simple, elegant and powerful properties, which we will explore in this
note.
First, a definition: we say that a is a root of the polynomial p(x) if p(a) =0. For example, the degree2 polynomial p(x) =x2 4 has two roots, namely 2 and 2, since p(2) = p(2) = 0. If we plot thepolynomial p(x)in thex-yplane, then the roots of the polynomial are just the places where the curve crossesthex axis:
We now state two fundamental properties of polynomials that we will prove in due course.
Property 1: A non-zero polynomial of degree dhas at mostdroots.
Property 2: Givend+ 1 pairs(x1,y1), . . . , (xd+1,yd+1), with all thexidistinct, there is a unique polynomialp(x)of degree (at most)dsuch that p(xi) =yifor 1 i d+ 1.
Let us consider what these two properties say in the case that d= 1. A graph of a linear (degree 1) polynomialy=a1x + a0is a line. Property 1 says that if a line is not thex-axis (i.e. if the polynomial is noty=0), then
it can intersect thex-axis in at most one point.
CS 70, Fall 2013, Note 5 1
8/13/2019 Discrete Math and Probability Theory
34/133
Property 2 says that two points uniquely determine a line.
Polynomial Interpolation
Property 2 says that two points uniquely determine a degree 1 polynomial (a line), three points uniquely
determine a degree 2 polynomial, four points uniquely determine a degree 3 polynomial, and so on. Given
d+ 1 pairs(x1,y1), . . . , (xd+1,yd+1), how do we determine the polynomial p(x) =adxd+ . . . + a1x + a0such
that p(xi) =yi for i= 1 to d+ 1? We will give an efficient algorithms for reconstructing the coefficientsa0, . . . , ad, and therefore the polynomial p(x).
The method is called Lagrange interpolation: Let us start by solving an easier problem. Suppose that
we are told that y1= 1 and yj =0 for 2 j d+ 1. Now can we reconstruct p(x)? Yes, this is easy!
Considerq(x) = (xx2)(xx3) (xxd+1). This is a polynomial of degree d(thexis are constants, andxappears d times). Also, we clearly haveq(xj) = 0 for 2 j d+ 1. But what isq(x1)? Well,q(x1) =(x1 x2)(x1 x3) (x1 xd+1), which is some constant not equal to 0. Thus if we let p(x) = q(x)/q(x1)(dividing is ok sinceq(x1) = 0), we have the polynomial we are looking for. For example, suppose you weregiven the pairs(1, 1),(2, 0), and(3, 0). Then we can construct the degree d=2 polynomial p(x)by lettingq(x) = (x2)(x3) =x25x + 6, andq(x1) = q(1) = 2. Thus, we can now constructp(x) = q(x)/q(x1) =(x2 5x + 6)/2.
Of course the problem is no harder if we single out some arbitrary index i instead of 1: i.e. yi=1 andyj=0for j =i. Let us introduce some notation: let us denote by i(x)the degreedpolynomial that goes through
thesed+ 1 points. Then i(x) = j=i(xxj)j=i(xixj)
.
Let us now return to the original problem. Givend+ 1 pairs(x1,y1), . . . , (xd+1,yd+1), we first construct thed+ 1 polynomials 1(x), . . . ,d+1(x). Now we can write p(x) =
d+1i=1 yii(x). Why does this work? First
notice that p(x)is a polynomial of degreedas required, since it is the sum of polynomials of degreed. Andwhen it is evaluated atxi,dof thed+ 1 terms in the sum evaluate to 0 and thei-thterm evaluates to yi times1, as required.
As an example, suppose we want to find the degree-2 polynomial p(x)that passes through the three points
CS 70, Fall 2013, Note 5 2
8/13/2019 Discrete Math and Probability Theory
35/133
(1, 1),(2, 2)and(3, 4). The three polynomials i are as follows: Ifd=2, andxi=i, for instance, then
1(x) =(x2)(x3)
(12)(13)=
(x2)(x3)
2 =
1
2x2
5
2x + 3;
2(x) =(x1)(x3)
(21)(23)=
(x1)(x3)
1 = x2 + 4x3;
3(x) =
(x1)(x2)
(31)(32)=
(x1)(x2)
2 =
1
2x2
3
2x + 1.
The polynomial p(x)is therefore given by
p(x) =1 1(x) + 2 2(x) + 4 3(x) =1
2x2
1
2x + 1.
You should verify that this polynomial does indeed pass through the above three points.
Proof of Property 2
We would like to prove property 2:
Property 2: Givend+ 1 pairs(x1,y1), . . . , (xd+1,yd+1), with all thexidistinct, there is a unique polynomialp(x)of degree (at most)dsuch that p(xi) =yifor 1 i d+ 1.
We have shown how to find a polynomial p(x)such that p(xi) =yi for d+ 1 pairs(x1,y1), . . . , (xd+1,yd+1).This proves part of property 2 (the existence of the polynomial). How do we prove the second part, that the
polynomial is unique? Suppose for contradiction that there is another polynomialq(x)such that p(xi) = yifor alld+ 1 pairs above. Now consider the polynomialr(x) = p(x)q(x). Since we are assuming thatq(x)and p(x) are different polynomials, r(x) must be a non-zero polynomial of degree at most d. Therefore,property 1 implies it can have at most d roots. But on the other handr(xi) = p(xi) q(xi) =0 on d+ 1distinct points. Contradiction. Therefore, p(x)is the unique polynomial that satisfies the d+ 1 conditions.
Polynomial Division
Lets take a short digression to discuss polynomial division, which will be useful in the proof of property 1.
If we have a polynomialp(x)of degreed, we can divide by a polynomial q(x)of degree dby using longdivision. The result will be:
p(x) =q(x)q(x) + r(x)
whereq(x)is the quotient and r(x)is the remainder. The degree ofr(x)must be smaller than the degree ofp(x).
Example. We wish to divide p(x) =x3 +x2 1 by q(x) =x1:
X2 + 2X+ 2
X1
X3 +X2 1
X
3
+X2
2X2
2X2 + 2X
2X1
2X+ 2
1
Nowp(x) =x3 +x2 1= (x1)(x2 + 2x + 2) + 1,r(x) =1 andq(x) =x2 + 2x + 2.
CS 70, Fall 2013, Note 5 3
8/13/2019 Discrete Math and Probability Theory
36/133
Proof of Property 1
Now let us turn to property 1: a non-zero polynomial of degree dhas at most droots.The idea of the proof
is as follows. We will prove the following claims:
Claim 1Ifa is a root of a polynomial p(x)with degreed, then p(x) = (xa)q(x)for a polynomialq(x)with degreed1.
Claim 2A polynomial p(x)of degreedwith distinct roots a1, . . . , adcan be written as p(x) = c(xa1) (xad).
Claim 2 implies property 1. We must show thata=ai for i= 1, . . . dcannot be a root of p(x). But thisfollows from claim 2, since p(a) =c(aa1) (aad) =0.
Proof of Claim 1
Dividingp(x)by(xa)gives p(x) = (xa)q(x)+ r(x), whereq(x)is the quotient andr(x)is the remainder.The degree ofr(x) is necessarily smaller than the degree of the divisor (x a). Thereforer(x) must have
degree 0 and therefore is some constant c. But now substitutingx= a, we get p(a) = c. But sincea is aroot, p(a) =0. Thusc=0 and therefore p(x) = (xa)q(x), thus showing that(xa)|p(x).
Claim 1 implies Claim 2
Proof by induction ond.
Base Case: We must show that a polynomial p(x) of degree 1 with root a1 can be written as p(x) =c(xa1). By Claim 1, we know that p(x) = (xa1)q(x), whereq(x)has degree 0 and is therefore aconstant.
Inductive Hypothesis: A polynomial of degreed1 with distinct rootsa1, . . . , ad1can be written asp(x) =c(xa1) (xad1).
Inductive Step: Let p(x) be a polynomial of of degree d with distinct roots a1, , ad. By Claim1, p(x) = (x ad)q(x) for some polynomial q(x) of degree d 1. Since 0= p(ai) = (ai ad)q(ai)for all i=d and ai ad=0 in this case, q(ai) must be equal to 0. Then q(x) is a polynomial ofdegree d 1 with distinct roots a1, . . . , ad1. We can now apply the inductive assumption to q(x)to write q(x) =c(x a1) (x ad1). Substituting in p(x) = (x ad)q(x), we finally obtain thatp(x) =c(xa1) (xad).
Finite FieldsBoth property 1 and property 2 also hold when the values of the coefficients and the variable x are chosen
from the complex numbers instead of the real numbers or even the rational numbers. They do not hold if the
values are restricted to being natural numbers or integers. Let us try to understand this a little more closely.
The only properties of numbers that we used in polynomial interpolation and in the proof of property 1 is
that we can add, subtract, multiply and divide any pair of numbers as long as we are not dividing by 0. We
cannot subtract two natural numbers and guarantee that the result is a natural number. And dividing two
integers does not usually result in an integer.
CS 70, Fall 2013, Note 5 4
8/13/2019 Discrete Math and Probability Theory
37/133
But if we work with numbers modulo a prime m, then we can add, subtract, multiply and divide (by any
non-zero number modulo m). To check this, recall that x has an inverse mod m if gcd(m,x) = 1, so ifm isprimeall the numbers{1, . . . , m1}have an inverse mod m. So both property 1 and property 2 hold if thecoefficients and the variable x are restricted to take on values modulo m. This remarkable fact that these
properties hold even when we restrict ourselves to a finiteset of values is the key to several applications that
we will presently see.
Let us consider an example of degree d= 1 polynomials modulo 5. Let p(x) = 2x + 3( mod5). The rootsof this polynomial are all values x such that 2x + 3=0( mod5)holds. Solving forx, we get that 2x= 3=2( mod5)or x=1( mod5). Note that this is consistent with property 1 since we got only 1 root of a degree1 polynomial.
Now consider the polynomials p(x) = 2x + 3 and q(x) = 3x 2 with all numbers reduced mod 5. We canplot the value of each polynomialyas a function ofxin thex-yplane. Since we are working modulo 5, there
are only 5 possible choices forx, and only 5 possible choices fory:
Notice that these two lines" intersect in exactly one point, even though the picture looks nothing at all like
lines in the Euclidean plane! Looking at these graphs it might seem remarkable that both property 1 and
property 2 hold when we work modulo m for any prime number m. But as we stated above, all that was
required for the proofs of property 1 and 2 was the ability to add, subtract, multiply and divide any pair of
numbers (as long as we are not dividing by 0), and they hold whenever we work modulo a primem.
When we work with numbers modulo a prime m, we say that we are working over a finite field, denoted
by Fm or GF(m) (for Galois Field). In order for a set to be called a field, it must satisfy certain axiomswhich are the building blocks that allow for these amazing properties and others to hold. If you would like
to learn more about fields and the axioms they satisfy, you can visit Wikipedias site and read the article
on fields: http://en.wikipedia.org/wiki/Field_\%28mathematics\%29. While you are
there, you can also read the article on Galois Fields and learn more about some of their applications and
elegant properties which will not be covered in this lecture: http://en.wikipedia.org/wiki/
Galois_field.
CountingHow many polynomials of degree (at most) 2 are there modulo m? This is easy: there are 3 coefficients,
each of which can take on one ofm values for a total ofm3
. Writing p(x) = adxd
+ ad1xd1
+ . . . + a0 byspecifying itsd+ 1 coefficientsai is known as the coefficient representation ofp(x). Is there any other wayto specify p(x)?
Sure, there is! Our polynomial of degree (at most) 2 is uniquely specified by its values at any three points, say
x=0, 1, 2. Once again each of these three values can take on one ofm values, for a total ofm3 possibilities.In general, we can specify a degree dpolynomial p(x)by specifying its values atd+ 1 points, say 0, 1, . . . , d.Thesed+ 1 values,(y0,y1, . . . ,yd)are called the value representation ofp(x). The coefficient representation
CS 70, Fall 2013, Note 5 5
8/13/2019 Discrete Math and Probability Theory
38/133
8/13/2019 Discrete Math and Probability Theory
39/133
over such a devastating and destructive weapon. Suppose the U.S. government finally decides that a nuclear
strike can be initiated only if at leastk>1 major officials agree to it. We want to devise a scheme such that(1) any group ofkof these officials can pool their information to figure out the launch code and initiate the
strike but (2) no group ofk1 or fewer have any information about the launch code, even if they pool their
knowledge. For example, they should not learn whether the secret is odd or even, a prime number, divisible
by some numbera, or the secrets least significant bit. How can we accomplish this?
Suppose that there aren officials indexed from 1 to n and the launch code is some natural number s. Letqbe a prime number larger than n and s. We will work overGF(q)from now on.
Now pick a random polynomial P(x) of degree k 1 such that P(0) = s and give P(1) to the first official,P(2)to the second,. . . ,P(n)to thenth . Then
Anykofficials, having the values of the polynomial at kpoints, can use Lagrange interpolation to find
P, and once they know what P is, they can compute P(0) =s to learn the secret. Another way to saythis is that anykofficials have between them a value representation of the polynomial, which they can
convert to the coefficient representation, which allows them to evaluateP(0) =s.
Any group of k 1 officials has no information about s. So they know only k 1 points through
which P(x), an unknown polynomial of degree k1 passes. They wish to reconstruct P(0). But byour discussion in the previous section, for each possible valueP(0) =b, there is a unique polynomialof degree k 1 that passes through the k 1 points of the k 1 officials as well as through (0, b).Hence the secret could be any of the q possible values {0, 1, . . . , q 1}, so the officials havein avery precise senseno information about s. Another way of saying this is that the information of the
officials is consistent withq different value representations, one for each possible value of the secret,
and thus the officials have no information about s. (Note that this is the main reason we choose to
work over finite fields rather than, say, over the real numbers, where the basic secret-sharing scheme
would still work. Because there are only finitely many values in our field, we can quantify precisely
how many remaining possibilities there are for the value of the secret, and show that this is the same
as if the officials had no information at all.)
Example. Suppose you are in charge of setting up a secret sharing scheme, with secret s= 1, where youwant to distributen=5 shares to 5 people such that any k=3 or more people can figure out the secret, buttwo or fewer cannot. Lets say we are working over GF(7)(note that 7>s and 7>n) and you randomlychoose the following polynomial of degree k1=2 : P(x) =3x2 + 5x + 1 (here,P(0) =1=s, the secret).So you know everything there is to know about the secret and the polynomial, but what about the people
that receive the shares? Well, the shares handed out are P(1) =2 to the first official,P(2) =2 to the second,P(3) =1 to the third, P(4) = 6 to the fourth, and P(5) = 3 to the fifth official. Lets say that officials 3,4, and 5 get together (we expect them to be able to recover the secret). Using Lagrange interpolation, they
compute the following delta functions:
3(x) =(x4)(x5)
(34)(35)
=(x4)(x5)
2
=4(x4)(x5);
4(x) =(x3)(x5)
(43)(45)=
(x3)(x5)
1 =6(x3)(x5);
5(x) =(x3)(x4)
(53)(54)=
(x3)(x4)
2 =4(x3)(x4).
They then compute the polynomial over GF(7): P(x) = (1)3(x) + (6)4(x) + (3)5(x) =3x2 + 5x + 1
(verify the computation!). Now they simply compute P(0)and discover that the secret is 1.
CS 70, Fall 2013, Note 5 7
8/13/2019 Discrete Math and Probability Theory
40/133
Lets see what happens if two officials try to get together, say persons 1 and 5. They both know that the
polynomial looks likeP(x) =a2x2 + a1x + s. They also know the following equations:
P(1) =a2+ a1+ s=2
P(5) =4a2+ 5a1+ s=3
But that is all they have two equations with three unknowns, and thus they cannot find out the secret. This
is the case no matter which two officials get together. Notice that since we are working overGF(7), thetwo people could have guessed the secret (0 s 6) and constructed a unique degree 2 polynomial (by
property 2). But the two people combined have the same chance of guessing what the secret is as they do
individually. This is important, as it implies that two people have no more information about the secret than
one person does.
CS 70, Fall 2013, Note 5 8
8/13/2019 Discrete Math and Probability Theory
41/133
CS 70 Discrete Mathematics and Probability Theory
Fall 2013 Vazirani Note 6
Error Correcting CodesWe will consider two situations in which we wish to transmit information on an unreliable channel. The
first is exemplified by the internet, where the information (say a file) is broken up into packets, and the un-
reliability is manifest in the fact that some of the packets are lost (or erased) during transmission. Moreover
the packets are labeled so that the recipient knows exactly which packets were received and which were
dropped. We will refer to such errors as erasure errors. See the figure below:
In the second situation, some of the packets are corrupted during transmission due to channel noise. Nowthe recipient has no idea which packets were corrupted and which were received unmodified:
In the above example, packets 1 and 4 are corrupted. These types of errors are called general errors. We will
discuss methods of encoding messages, called error correcting codes, which are capable of correcting both
erasure and general errors.
Assume that the information consists ofn packets. We can assume without loss of generality that the contents
of each packet is a number moduloq (denoted byGF(q)), whereq is a prime. For example, the contents ofthe packet might be a 32-bit string and can therefore be regarded as a number between 0 and 2 32 1; then
we could choose q to be any prime larger than 2 32. The properties of polynomials overGF(q) (i.e., withcoefficients and values reduced modulo q) are the backbone of both error-correcting schemes. To see this,
let us denote the message to be sent by m1, . . . ,mnand make the following crucial observations:
1)There is a unique polynomial P(x)of degreen1 such thatP(i) = mi for 1 i n (i.e., P(x)containsall of the information about the message, and evaluating P(i)gives the contents of thei-th packet).
2)The message to be sent is nowm1=P(1), . . . ,mn=P(n). We can generate additional packets by evaluat-ingP(x)at additional pointsn + 1,n+ 2, . . . ,n+j (remember, our transmitted message must be redundant,i.e., it must contain more packets than the original message to account for the lost or corrupted packets).
Thus the transmitted message is c1= P(1),c2= P(2), . . . ,cn+j= P(n+ j). Since we are working moduloq, we must make sure that n + j q, but this condition does not impose a serious constraint since q is very
large.
Erasure Errors
Here we consider the setting of packets being transmitted over the internet. In this setting, the packets are
labeled and so the recipient knows exactly which packets were dropped during transmission. One additional
observation will be useful:
CS 70, Fall 2013, Note 6 1
8/13/2019 Discrete Math and Probability Theory
42/133
3)By Property 2 in Note 7, we can uniquely reconstruct P(x)from its values at anyn distinct points, sinceit has degree n 1. This means that P(x) can be reconstructed from any n of the transmitted packets.Evaluating this reconstructed polynomial P(x)at x=1, . . . ,nyields the original messagem1, . . . ,mn.
Recall that in our scheme, the transmitted message isc1=P(1),c2=P(2), . . . ,cn+j=P(n+j). Thus, if wehope to be able to correct kerrors, we simply need to set j= k. The encoded message will then consist ofn+ kpackets.
Example
Suppose Alice wants to send Bob a message ofn= 4 packets and she wants to guard against k= 2 lostpackets. Then, assuming the packets can be coded up as integers between 0 and 6, Alice can work over