Discrete Math and Probability Theory

Embed Size (px)

Citation preview

  • 8/13/2019 Discrete Math and Probability Theory

    1/133

    CS 70 Discrete Mathematics and Probability Theory

    Fall 2013 Vazirani Note 0

    Review of Sets and Mathematical NotationAset is a well defined collection of objects. These objects are called elementsor membersof the set, and

    they can be anything, including numbers, letters, people, cities, and even other sets. By convention, sets are

    usually denoted by capital letters and can be described or defined by listing its elements and surrounding

    the list by curly braces. For example, we can describe the set A to be the set whose members are the first

    five prime numbers, or we can explicitly write: A= {2, 3, 5, 7, 11}. Ifx is an element ofA, we writex A.Similarly, ify is not an element ofA, then we writey A. Two sets A and B are said to be equal, written asA=B, if they have the same elements. The order and repetition of elements do not matter, so {red, white,blue} = {blue, white, red} = {red, white, white, blue}. Sometimes, more complicated sets can be defined by

    using a different notation. For example, the set of all rational numbers denoted by Q can be written as: {ab

    | a, b are integers, b =0}. In English, this is read as the set of all fractions such that the numerator is aninteger and the denominator is a non-zero integer."

    Cardinality

    We can also talk about the size of a set, or itscardinality. IfA= {1,2,3,4}, then the cardinality ofA, denoted

    by |A|, is 4. It is possible for the cardinality of a set to be 0. This set is called the empty set, denoted by thesymbol /0. A set can also have an infinite number of elements, such as the set of all integers, prime numbers,

    or odd numbers.

    Subsets and Proper Subsets

    If every element of a set A is also in set B, then we say thatA is asubsetofB, writtenA B. Equivalently

    we can writeB A, or B is a superset ofA. Aproper subsetis a setA that is strictly contained inB, writtenasA B, meaning thatAexcludes at least one element ofB. For example, consider the setB = {1, 2, 3, 4, 5}.Then {1, 2, 3} is both a subset and a proper subset ofB, while {1, 2, 3, 4, 5} is a subset but not a proper subsetofB. Here are a few basic properties regarding subsets:

    The empty set is a proper subset of any nonempty setA: /0 A.

    The empty set is a subset of every setB: /0 B.

    Every setA is a subset of itself: A A.

    Intersections and Unions

    Theintersectionof a set A with a setB, written asAB, is a set containing all elements which are in bothAand B. Two sets are said to be disjointifA B= /0. Theunionof a set A with a set B, written asA B,is a set of all elements which are in either Aor Bor both. For example, ifA is the set of all positive even

    numbers, and B is the set of all positive odd numbers, then AB= /0, and AB= Z+, or the set of allpositive integers. Here are a few properties of intersections and unions:

    AB=BA

    CS 70, Fall 2013, Note 0 1

  • 8/13/2019 Discrete Math and Probability Theory

    2/133

    A /0=A

    AB=BA

    A /0= /0

    Complements

    IfA and B are two sets, then the relative complement ofA in B, written as BA or B\A, is the set ofelements in B, but not in A: B\A= {x B| x A}. For example, ifB = {1, 2, 3} and A = {3, 4, 5}, thenB\A= {1, 2}. For another example, ifR is the set of real numbers and Q is the set of rational numbers, thenR\Q is the set of irrational numbers. Here are some important properties of complements:

    A\A= /0

    A\/0=A

    /0\A= /0

    Significant Sets

    In mathematics, some sets are referred to so commonly that they are denoted by special symbols. Some of

    these numerical sets include:

    N denotes the set of all natural numbers:{0, 1, 2, 3, ...}.

    Z denotes the set of all integer numbers:{. . . ,2,1, 0, 1, 2, . . .}.

    Q denotes the set of all rational numbers:{ ab| a, b Z,b =0}.

    R denotes the set of all real numbers.

    C denotes the set of all complex numbers.

    In addition, theCartesian product (also called the cross product) of two sets A and B, written as AB,is the set of all pairs whose first component is an element ofA and whose second component is an element

    ofB. In set notation,AB={(a, b)| a A, b B}. For example, ifA= {1, 2, 3} and B= {u, v}, thenAB= {(1, u), (1, v), (2, u), (2, v), (3, u), (3, v)}. Given a set S, another significant set is the power setofS,denoted by P(S), is the set of all subsets ofS:{T| T S}. For example, ifS= {1, 2, 3}, then the power setofSis: P(S) = {{},{1},{2},{3},{1, 2},{1, 3},{2, 3},{1, 2, 3}}. It is interesting to note that, if|S| =k,then |P(S)| =2k.

    Mathematical notation:

    Sums and Products:

    There is a compact notation for writing sums or products of large numbers of items. For example, to write

    1 + 2 + + n, without having to say dot dot dot, we write it as ni=1 i. More generally we can write the sumf(m) +f(m + 1) + +f(n)as ni=m f(i). Thus,

    ni=5 i

    2 =52 + 62 + + n2.To write the product f(m)f(m + 1) f(n), we use the notation ni=m f(i). For example,

    ni=1 i=1 2 n.

    CS 70, Fall 2013, Note 0 2

  • 8/13/2019 Discrete Math and Probability Theory

    3/133

    Universal and existential quantifiers:

    Consider the statement: For all natural numbers n,n2 + n + 41 is prime. Here,n is quantified to any elementof the set N of natural numbers. In notation, we write(n N)(n2 + n + 41 is prime). Here we have used theuniversal quantifier (for all). Is the statement true? If you try to substitute small values ofn, you willnotice thatn2 + n + 41 is indeed prime for those values. But if you think harder, you can find larger valuesofn for which it is not prime. Can you find one? So the statement(n N)(n2 + n + 41 is prime) is false.

    Theexistential quantifer (there exists) is used in the following statement:x Z xx

    2. y Zx Z y>x

    The first statement says that, given an integer, I can find a larger one. The second statement says something

    very different: that there is a largest integer! The first statement is true, the second is not.

    CS 70, Fall 2013, Note 0 3

  • 8/13/2019 Discrete Math and Probability Theory

    4/133

  • 8/13/2019 Discrete Math and Probability Theory

    5/133

    Inductive Step: Prove that it also holds forn= (k+ 1), i.e.k+1

    i=0

    i=(k+ 1)(k+ 2)

    2 :

    k+1

    i=0

    i= (k

    i=0

    i) + (k+ 1)

    =k(k+ 1)

    2 + (k+ 1) (by the inductive hypothesis)

    = (k+ 1)(k

    2+ 1)

    =(k+ 1)(k+ 2)

    2 .

    Hence, by the principle of induction, the theorem holds.

    Lets step back and look at the general form of such a proof, and also why it makes sense. Let us denote

    byP(n)the statementn

    i=0

    i=n(n + 1)

    2 . So we wish to prove thatn N,P(n). Theprinciple of induction

    asserts that you can prove P(n)is true n N, by following these three steps:

    Base Case: Prove thatP(0)is true.

    Inductive Hypothesis: Assume that P(k)is true.

    Inductive Step: Show that it follows thatP(k+ 1)is true.

    To understand why induction works, think of the statements P(n)as represented by a sequence of dominoes,numbered from 0,1,2,...,n, such that P(0) corresponds to the 0th domino, P(1) corresponds to the 1st

    domino, and so on. The dominoes are lined up so that if the kth domino is knocked over, then it in turnknocks over the k+ 1st. Knocking over the kth domino corresponds to proving P(k) is true. And theinduction step corresponds to the placement of the dominoes to ensure that if the kth domino falls, in turn

    it knocks over the k+ 1st domino. The base case (n= 0) knocks over the 0th domino, setting off a chainreaction that knocks down all the dominoes.

    It is worth examining more closely the induction proof example above. To prove P(k+ 1), we find within it

    the statementP(k): k+1i=0i = (k

    i=0

    i) + (k+ 1). This is the key to the induction step.

    We will now look at another proof by induction, but first we will introduce some notation and a definition

    for divisibility. Given integersa and b, we say thata dividesb (or b is divisible bya), written asa|b, if andonly if for some integerq,b=aq. In mathematical notation, a,b Z,a|biffq Z :b=aq.

    CS 70, Fall 2013, Note 1 2

  • 8/13/2019 Discrete Math and Probability Theory

    6/133

    Theorem: n N,n3nis divisible by 3.

    Proof(by induction overn): LetP(n)denote the statement n N,n3nis divisible by 3.

    Base Case:P(0)asserts that 3|(030)or 3|0, which is true since non-zero integer divides 0. (In thiscase, 0=3 0).

    Inductive Hypothesis: AssumeP(k)is true. That is, 3|(k3 k), or q Z,k3 k=3q.

    Inductive Step: We must show that P(k+ 1)is true, which asserts that 3|((k+ 1)3 (k+ 1)). Let usexpand this out:

    (k+ 1)3 (k+ 1) =k3 + 3k2 + 3k+ 1 (k+ 1)

    = (k3 k) + 3k2 + 3k

    =3q + 3(k2 + k), q Z (by the inductive hypothesis)

    =3(q + k2 + k)

    So 3|((k+ 1)3 (k+ 1)).

    Hence, by the principle of induction, n N, 3|(n3n).

    There is a clever direct proof without any induction for the above statement. Can you see it?

    Two Color Theorem: There is a famous theorem called the four color theorem. It states that any map

    can be colored with four colors such that any two adjacent countries (which share a border, but not just

    a point) must have different colors. The four color theorem is very difficult to prove, and several bogus

    proofs were claimed since the problem was first posed in 1852. It was not until 1976 that the theorem was

    finally proved (with the aid of a computer) by Appel and Haken. (For an interesting history of the problem,

    and a state-of-the-art proof, which is nonetheless still very challenging, see www.math.gatech.edu/$\sim$thomas/FC/fourcolor.html). We consider a simpler scenario, where we divide the plane

    into regions by drawing lines, where each line divides the plane into two regions (i.e. it extends to infinity).

    We want to know if we can color this map using no more than two colors (say, red and blue) such that no

    two regions that share a boundary have the same color. Here is an example of a two-colored map:

    We will prove this two color theorem" by induction on n, the number of lines:

    Base Case: Prove thatP(0)is true, which is the proposition that a map withn=0 lines can be can becolored using no more than two colors. But this is easy, since we can just color the entire plane using

    one color.

    CS 70, Fall 2013, Note 1 3

  • 8/13/2019 Discrete Math and Probability Theory

    7/133

    Inductive Hypothesis: AssumeP(n). That is, a map with n lines can be two-colored.

    Inductive Step: ProveP(n + 1). We are given a map with n + 1 lines and wish to show that it can betwo-colored. Lets see what happens if we remove a line. With only n lines on the plane, we know

    we can two-color the map (by the inductive hypothesis). Let us make the following observation: if

    we swap red blue, we still have a two-coloring. With this in mind, let us place back the line weremoved, and leave colors on one side of the line unchanged. On the other side of the line, swap red

    blue. We claim that this is a valid two-coloring for the map with n + 1 lines.

    Why does this work? Any border of a region either consists of a part of one of the original n lines or

    a piece of the n + 1-st line. If it is a part of one of the original n lines, then the two regions on eitherside are both on the same side of then + 1-st line, and the colors of the regions must be distinct, by

    the induction hypothesis. On the other hand, if the border is part of the n + 1-th line, then the tworegions were created by dividing a single region from the induction hypothesis, and by constructionwe reversed colors on one side of the line, and so they have opposite colors.

    Induction is a very powerful technique. But you will need to exercise care while using it, since even small

    errors can lead to proving ridiculously false statements. Here is a dramatic example: in the middle of the

    last century, a colloquial expression in common use was that is a horse of a different color", referring to

    something that is quite different from normal or common expectation. The famous mathematician George

    Polya (who was also a great expositor of mathematics for the lay public) gave the following proof to show

    that there is no horse of a different color!

    Theorem: All horses are the same color.

    Proof(by induction on the number of horses):

    Base Case: P(1) is certainly true, since if you have a set containing just one horse, all horses in theset have the same color.

    Inductive Hypothesis: AssumeP(n), which is the statement that in any set ofn horses, they all havethe same color.

    Inductive Step: Given a set ofn + 1 horses {h1,h2, . . . ,hn+1}, we can exclude the last horse in the setand apply the inductive hypothesis just to the first n horses

    {h

    1, . . . ,h

    n}, deducing that they all have

    the same color. Similarly, we can conclude that the last n horses {h2, . . . ,hn+1} all have the samecolor. But now the middle horses {h2, . . . ,hn} (i.e., all but the first and the last) belong to both ofthese sets, so they have the same color as horseh1 and horsehn+1. It follows, therefore, that alln + 1horses have the same color. Thus, by the principle of induction, all horses have the same color.

    Clearly, it is not true that all horses are of the same color, so where did we go wrong in our induction proof?

    It is tempting to blame the induction hypothesis which is clearly false. But the whole point of induction

    CS 70, Fall 2013, Note 1 4

  • 8/13/2019 Discrete Math and Probability Theory

    8/133

    is that if the base case is true (which it is in this case), and assuming the induction hypothesis for any n we

    can prove the case n + 1, then the statement is true for all n. So what we are looking for is a flaw in thereasoning!

    What makes the flaw in this proof a little tricky to spot is that the induction stepis valid for a typical" value

    ofn, say,n =3. The flaw, however, is in the induction step when n =1. In this case, forn + 1=2 horses,there areno middle horses, and so the argument completely breaks down!

    Strengthening the Inductive Hypothesis

    Let us prove by induction the following proposition:

    Theorem: n 1, the sum of the firstn odd numbers is a perfect square.

    Proof: By induction on n.

    Base Case:n=1. The first odd number is 1, which is a perfect square.

    Inductive Hypothesis: Assume that the sum of the firstkodd numbers is a perfect square, saym2.

    Inductive Step: Thek+ 1-th odd number is 2k+ 1, so by the induction hypothesis, the sum of the firstk+ 1 odd numbers ism2 + 2k+ 1. But now we are stuck. Why shouldm2 + 2k+ 1 be a perfect square?

    Well, lets just take a detour and compute the values of the first few cases. Maybe we will identify another

    pattern.

    n=1 : 1=12 is a perfect square.

    n=2 : 1 + 3=4=22 is a perfect square.

    n=3 : 1 + 3 + 5=9=32 is a perfect square.

    n=4 : 1 + 3 + 5 + 7=16=42 is a perfect square.

    Wait, isnt there a pattern where the sum of the first n odd numbers is just n2? Here is an idea: let us show

    something stronger!

    Theorem: For alln 1, the sum of the firstn odd numbers isn2.

    Proof: By induction on n.

    Base Case:n=1. The first odd number is 1, which is 12.

    Inductive Hypothesis: Assume that the sum of the firstkodd numbers isk2.

    CS 70, Fall 2013, Note 1 5

  • 8/13/2019 Discrete Math and Probability Theory

    9/133

    Inductive Step: The (k+ 1)-st odd number is 2k+ 1, so by the induction hypothesis the sum of thefirst k+ 1 odd numbers is k2 + (2k+ 1) = (k+ 1)2. Thus by the principle of induction the theoremholds.

    See if you can understand what happened here. We could not prove a proposition, so we proved a harder

    proposition instead! Can you see why that can sometimes be easier when you are doing a proof by induction?

    When you are trying to prove a stronger statement by induction, you have to show something harder in theinduction step, but you also get to assume something stronger in the induction hypothesis. Sometimes the

    stronger assumption helps you reach just that much further...

    Here is another example:

    Imagine that we are given L-shaped tiles (i.e., a 22 square tile with a missing 11 square), and we wantto know if we can tile a 2n2n courtyard with a missing 11 square in the middle. Here is an example ofa successful tiling in the case thatn=2:

    Let us try to prove the proposition by induction onn.

    Base Case: ProveP(1). This is the proposition that a 22 courtyard can be tiled with L-shaped tileswith a missing 11 square in the middle. But this is easy:

    Inductive Hypothesis: AssumeP(n) is true. That is, we can tile a 2n 2n courtyard with a missing11 square in the middle.

    Inductive Step: We want to show that we can tile a 2n+12n+1 courtyard with a missing 11 square inthe middle. Lets try to reduce this problem so we can apply our inductive hypothesis. A 2n+12n+1

    courtyard can be broken up into four smaller courtyards of size 2n 2n, each with a missing 1 1square as follows:

    But the holes are not in the middle of each 2n 2n courtyard, so the inductive hypothesis does nothelp! How should we proceed? We should strengthen our inductive hypothesis!

    What we are about to do is completely counter-intuitive. Its like attempting to lift 100 pounds, failing, and

    then saying I couldnt lift 100 pounds. Let me try to lift 200," and then succeeding! Instead of proving that

    CS 70, Fall 2013, Note 1 6

  • 8/13/2019 Discrete Math and Probability Theory

    10/133

    we can tile a 2n2n courtyard with a hole in the middle, we will try to prove something stronger: that wecan tile the courtyard with the hole beinganywhere we choose. It is a trade-off: we have to prove more, but

    we also get to assume a stronger hypothesis. The base case is the same, so we will just work on the inductive

    hypothesis and step.

    Inductive Hypothesis (second attempt): AssumeP(n) is true, so that we can tile a 2n2n courtyard

    with a missing 11 square anywhere.

    Inductive Step (second attempt): As before, we can break up the 2n+12n+1 courtyard as follows.

    By placing the first tile as shown, we get four 2 n 2n courtyards, each with a 1 1 hole; three ofthese courtyards have the hole in one corner, while the fourth has the hole in a position determined by

    the hole in the 2n

    +12n

    +1 courtyard. The stronger inductive hypothesis now applies to each of thesefour courtyards, so that each of them can be successfully tiled. Thus, we have proven that we can tile

    a 2n+12n+1 courtyard with a hole anywhere! Hence, by the induction principle, we have proved the(stronger) theorem.

    Strong Induction

    Strong induction is very similar to simple induction, with the exception of the inductive hypothesis. With

    strong induction, instead of just assumingP(k) is true, you assume the stronger statement thatP(0), P(1),. . . , andP(k)are all true (i.e., P(0)P(1) P(k)is true, or in more compact notation

    ki=0 P(i)is true).

    Strong induction sometimes makes the proof of the inductive step much easier since we get to assume astronger statement, as illustrated in the next example.

    Theorem: Every natural numbern>1 can be written as a product of primes.

    Recall that a numbern 2 is prime if 1 and n are its only divisors. Let P(n)be the proposition thatn can bewritten as a product of primes. We will prove thatP(n)is true for all n 2.

    Base Case: We start atn=2. ClearlyP(2)holds, since 2 is a prime number.

    Inductive Hypothesis: AssumeP(k) is true for 2 k n: i.e., every numberk: 2 k n can bewritten as a product of primes.

    Inductive Step: We must show thatn + 1 can be written as a product of primes. We have two cases:either n + 1 is a prime number, or it is not. For the first case, ifn + 1 is a prime number, then weare done. For the second case, ifn + 1 is not a prime number, then by definition n + 1=xy, wherex,y Z+ and 1

  • 8/13/2019 Discrete Math and Probability Theory

    11/133

    Why does this proof fail if we were to use simple induction? If we only assumeP(n)is true, then we cannotapply our inductive hypothesis to x and y. For example, if we were trying to proveP(42), we might write42=67, and then it is useful to know that P(6) and P(7)are true. However, with simple induction, wecould only assume P(41), i.e., that 41 can be written as a product of primes a fact that is not useful inestablishing P(42).

    To understand why strong induction works, lets think about our domino analogy. By the time we ready

    for thek+ 1-st domino to fall, dominoes numbered 0 through khave already been knocked over. But thisis exactly what strong induction assumes: to proveP(k+ 1), we can assume we already know that P(0)throughP(k)are true.

    Simple Induction vs. Strong Induction

    We have seen that strong induction makes certain proofs easy when simple induction seems to fail. A natural

    question to ask then, is whether the strong induction axiom is logically stronger than the simple induction

    axiom. In fact, the two methods of induction are logically equivalent. Clearly anything that can be proven by

    simple induction can also be proven by strong induction (convince yourself of this!). For the other direction,

    suppose we can prove by strong induction that n P(n). LetQ(k) =P(0) P(k). Let us prove k Q(k)by simple induction. The proof is modeled after the strong induction proof ofn P(n). That is, we wantto show Q(k) Q(k+ 1), or equivalently P(0) P(k) P(0) P(k)P(k+ 1). But this istrue iffP(0) P(k) P(k+ 1). This is exactly what the strong induction proof ofn P(n)establishes!Therefore, we can establish n Q(n)by simple induction. And clearly, proving n Q(n)also proves n P(n).

    Well Ordering Principle

    In the context of proving statement about algorithms or programs, it is often convenient to formulate an

    induction proof in a different way. We start by asking how the statementn N, P(n) could fail? Well,it means that there must be some values ofn for which P(n) is false. Letm be the smallest such naturalnumber. We know that m must be greater than 0 since P(0)is true (base case), which indicates m1 N.Sincem is the smallest input that makesP(m)false,P(m1)must be true. ButP(m1) P(m), which isa contradiction.

    We assumed something when defining m that is usually taken for granted: that we can actually find a smallest

    number in any subset of natural numbers. This property does nothold for, say, the real numbers; to see why,

    consider the set{x R : 0

  • 8/13/2019 Discrete Math and Probability Theory

    12/133

    Lets look at an example.

    Round robin tournament:Suppose that, in a round robin tournament, we have a set ofkplayers {p1,p2, . . . ,pk}such that p1 beats p2, p2 beats p3, . . . , pk1 beats pk, and pkbeats p1. This is called a cyclein the tourna-

    ment:

    (A round robin tournament is a tournament where each participant plays every other contestant exactly once.

    Thus, if there aren players, there will be exactly n(n1)

    2 matches. Also, we are assuming that every match

    ends in either a win or a loss; no ties.)

    Claim: If there exists a cycle in a tournament, then there exists a cycle of length 3.

    Proof: For the base case, notice that we cannot have a cycle of length less than 3, and if there is a cycle of

    length 3 then the proposition is true.

    Assume for contradiction that the smallest cycle is:

    withn>3. Let us look at the game between p1 and p3. We have two cases: either p3 beats p1, or p1 beatsp3. In the first case (where p3 beats p1), then we are done because we have a 3-cycle. In the second case

    (where p1 beats p3), we have a shorter cycle {p3,p4, . . . ,pn} and thus a contradiction. Therefore, if thereexists a cycle, then there must exist a 3-cycle as well.

    Can we prove this claim using more traditional induction? Let us start with the base case ofn =3 playersand proceed from there.

    Proof: By induction on n.

    Base Case: As above.

    Inductive Hypothesis: If a round-robin tournament has a cycle of lengthkthen it has a cycle of length

    3.

    CS 70, Fall 2013, Note 1 9

  • 8/13/2019 Discrete Math and Probability Theory

    13/133

    Inductive Step: Given a round-robin tournament with a cycle of lengthk+ 1, we wish to show theremust be a 3-cycle. Assume wlog that the cycle involves playersp1throughpk+1in that order. Consider

    the outcome of the match between p1 and p3. If p3 beats p1 then we have a 3-cycle. If p1 beats p3,

    there is a k-cycle that goes directly from p(1) p(3)and continues as before. Applying the inductionhypothesis, we conclude that there must be a 3-cycle in the tournament.

    Induction and Recursion

    There is an intimate connection between induction and recursion in mathematics and computer science. A

    recursive definition of a function over the natural numbers specifies the value of the function at small values

    ofn, and defines the value of f(n)for a general n in terms of the value of f(m)for m

  • 8/13/2019 Discrete Math and Probability Theory

    14/133

    Can you figure out how long this program takes to computeF(n)? This is a very inefficient way to computethe n-th Fibonacci number. A much faster way is to turn this into an iterative algorithm (this should be a

    familiar example of turning a tail-recursion into an iterative algorithm):

    function F2(n)

    if n=0 then return 0

    if n=1 then return 1a = 1

    b = 0

    f o r k = 2 t o n d o

    temp = a

    a = a + b

    b = temp

    return a

    Can you show by induction that this new function F2(n) =F(n)? How long does this program take tocomputeF(n)?

    Clearly, induction and recursion are closely related. In fact, proofs involving a recursively-defined concept,

    e.g. factorial, are often best done using induction. Formally, the factorial of a nonnegative number n is

    defined recursively as n!=n(n1)(n2)...1, with a base case 0!=1, whereas exponentiation is definedrecursively asxn =xn1x. In this next example, we will look at is an inequality between two functions ofn.Such inequalities are useful in computer science when showing that one algorithm is more efficient than

    another.

    Notice that for this example, we have chosen as our base case n =2 rather thann =0. This is because thestatement is trivially true forn1= n!

  • 8/13/2019 Discrete Math and Probability Theory

    15/133

    Practice Problems1. Prove for any natural numbern that 12 + 22 + 32 + . . .+ n2 = 1

    6n(n + 1)(2n + 1).

    2. Prove that 3n >2n for all natural numbersn 1.

    3. In real analysis, Bernoullis Inequality is an inequality which approximates the exponentiations of

    1 +x. Prove this inequality, which states that (1 +x)n 1 + nxifn is a natural number and 1 +x>0.

    CS 70, Fall 2013, Note 1 12

  • 8/13/2019 Discrete Math and Probability Theory

    16/133

    CS 70 Discrete Mathematics and Probability Theory

    Fall 2013 Vazirani Note 2

    The Stable Marriage Problem: Induction Proofs in Algorithms

    A dating agency must match up n men and n women. Each man has an ordered preference listof the n

    women, and each woman has a similar list of then men. Is there a good algorithm to pair them up?

    Consider for example n=3 men represented by numbers 1, 2, and 3 and 3 women A, B, andC, with thefollowing preference lists:

    Men Women1 A B C

    2 B A C

    3 A B C

    Women MenA 2 1 3

    B 1 2 3

    C 1 2 3

    There are many possible pairings for this example, two of which are {(1,A), (2,B), (3,C)} and {(1,B), (2,C),

    (3,A)}. How do we decide which pairing to choose? Let us look at an algorithm for this problem that is

    simple, fast, and widely-used.

    The Stable Marriage Algorithm 1

    Every Morning:Each man goes to the first woman on his list not yet crossed off and proposes to her.

    Every Afternoon:Each woman says maybe, come back tomorrow to the man she likes best among

    the proposals (she now has him on a string) and never to all the rest.

    Every Evening:Each rejected suitor crosses off the woman who rejected him from his list.

    The above loop is repeated each successive day until there are no more rejected suitors. On this day,

    each woman marries the man she has on a string.

    How is this algorithm used in the real world?

    1This algorithm is based on a 1950s model of dating where the men propose to the women, and the women accept or reject

    these proposals

    CS 70, Fall 2013, Note 2 1

  • 8/13/2019 Discrete Math and Probability Theory

    17/133

    The Residency Match

    Perhaps the most well-known application of the Stable Marriage Algorithm is the residency match program,

    which pairs medical school graduates and residency slots (internships) at teaching hospitals. Graduates

    and hospitals submit their ordered preference lists, and the stable pairing produced by a computer matches

    students with residency programs.

    The road to the residency match program was long and twisted. Medical residency programs were first

    introduced about a century ago. Since interns offered a source of cheap labor for hospitals, soon the number

    of residency slots exceeded the number of medical graduates, resulting in fierce competition. Hospitals tried

    to outdo each other by making their residency offers earlier and earlier. By the mid-40s, offers for residency

    were being made by the beginning of junior year of medical school, and some hospitals were contemplating

    even earlier offers to sophomores! The American Medical Association finally stepped in and prohibited

    medical schools from releasing student transcripts and reference letters until their senior year. This sparked

    a new problem, with hospitals now making short fuse" offers to make sure that if their offer was rejected

    they could still find alternate interns to fill the slot. Once again the competition between hospitals led to an

    unacceptable situation, with students being given only a few hours to decide whether they would accept an

    offer.

    Finally, in the early 50s, this unsustainable situation led to the centralized system called the National Res-

    idency Matching Program (N.R.M.P.) in which the hospitals ranked the residents and the residents ranked

    the hospitals. The N.R.M.P. then produced a pairing between the applicants and the hospitals, though at

    first this pairing was not stable. It was not until 1952 that the N.R.M.P. switched to the Stable Marriage

    Algorithm, resulting in a stable pairing.

    Most recently, Lloyd Shapley and Alvin Roth won the Nobel Prize in Economic Sciences 2012, by extending

    the Stable Marriage Algorithm we study in this lecture!2

    Properties of the Stable Marriage AlgorithmWe wish to show that the stable marriage algorithm is fast and finds a good pairing. But first, we must show

    that it halts. Here is a simple argument: on each day that the algorithm does not halt, at least one man must

    eliminate some woman from his list (otherwise the halting condition for the algorithm would be invoked).

    Since each list has n elements, and there are n lists, this means that the algorithm must terminate in at most

    n2 days. Next, we need to show that the Stable Marriage Algorithm finds a good pairing. Before we do this,

    we should discuss what we consider to be a good pairing.

    Stability

    What properties should a good pairing have? One possible criterion for a good pairing is one in which the

    number of first ranked choices is maximized. Another possibility is to minimize the number of last ranked

    choices. Or perhaps minimizing the sum of the ranks of the choices, which may be thought of as maximizing

    the average happiness.

    2See http://www.nobelprize.org/nobel_prizes/economic-sciences/laureates/2012/

    popular-economicsciences2012.pdffor more details

    CS 70, Fall 2013, Note 2 2

  • 8/13/2019 Discrete Math and Probability Theory

    18/133

    In this lecture we will focus on a very basic criterion: stability. A pairing is unstable if there is a man and a

    woman who prefer each other to their current partners. We will call such a pair a rogue couple. So a pairing

    ofn men andn women is stable if it has no rogue couples.

    An unstable pairing from the example given in the beginning is: {(1,C), (2,B), (3,A)}. The reason is that 1

    andB form a rogue couple, since 1 would rather be with B thanC(his current partner), and since B would

    rather be with 1 than 2 (her current partner).

    An example of a stable pairing is: {(2,A), (1,B), (3,C)}. Note that(1,A)is not a rogue couple. It is true thatman 1 would rather be with woman A than his current partner. Unfortunately for him, she would rather be

    with her current partner than with him. Note also that both 3 and Care paired with their least favorite choice

    in this matching. Nonetheless, it is a stable pairing, since none of their preferred choices would rather be

    with them.

    Before we discuss how to find a stable pairing, let us ask a more basic question: do stable pairings always

    exist? Surely the answer is yes, since we could start with any pairing and make it more and more stable as

    follows: if there is a rogue couple, modify the current pairing so that they are together. Repeat. Surely this

    procedure must result in a stable pairing! Unfortunately this reasoning is not sound. To demonstrate this,

    let us consider a slightly different scenario, the roommates problem. Here we have 2npeople who must be

    paired up to be roommates (the difference being that unlike the dating scenario, a person can be paired with

    any of the remaining 2n1). The point is that nothing about the intuition about the progress made by thestable marriage algorithm relied on the fact that men can only be paired with women, so the same intuition

    should apply to the roommates problem as well. The following counter-example illustrates the fallacy in the

    reasoning:

    CS 70, Fall 2013, Note 2 3

  • 8/13/2019 Discrete Math and Probability Theory

    19/133

  • 8/13/2019 Discrete Math and Probability Theory

    20/133

    of the Stable Marriage algorithm. We will use the preference lists given earlier, which are duplicated here

    for convenience:

    Men Women

    1 A B C

    2 B A C

    3 A B C

    Women Men

    A 2 1 3

    B 1 2 3

    C 1 2 3

    The following table shows which men propose to which women on the given day (the circled men are the

    ones who were told maybe):

    Thus, the stable pairing which the algorithm outputs is: {(1,A), (2,B), (3,C)}.

    Theorem: The pairing produced by the algorithm is always stable.

    Proof: We will show that no man M can be involved in a rogue couple. Consider any couple (M,W) in

    the pairing and suppose that M prefers some woman W* to W. We will argue that W* prefers her partner

    to M, so that (M,W*) cannot be a rogue couple. Since W* occurs before W in Ms list, he must have

    proposed to her before he proposed to W. Therefore, according to the algorithm, W* must have rejected him

    for somebody she prefers. By the Improvement Lemma, W* likes her final partner at least as much, andtherefore prefers him to M. Thus no man M can be involved in a rogue couple, and the pairing is stable.

    CS 70, Fall 2013, Note 2 5

  • 8/13/2019 Discrete Math and Probability Theory

    21/133

    Optimality

    Consider the situation in which there are 4 men and 4 women with the following preference lists:

    Men Women

    1 A B C D

    2 A D C B

    3 A C B D

    4 A B C D

    Women Men

    A 1 3 2 4

    B 4 3 2 1

    C 2 3 1 4

    D 3 4 2 1

    For these preference lists, there are exactly two stable pairings: S= {(1,A), (2,D), (3,C), (4,B)} and T={(1,A), (2,C), (3,D), (4,B)}. The fact that there is more than one stable pairing brings up an interesting

    question. What is the best possible partner for each person, say man 2 for example?

    The trivial answer is his first choice (i.e., woman A), but that is just not a realistic possibility for him. Pairing

    man 2 with woman A would simply not be stable, since he is so low on her preference list. And indeed there

    is no stable pairing in which 2 is paired with A. Examining the two stable pairings, we can see that the best

    possible realistic outcome for man 2 is to be matched to woman D.

    Let us make some definitions to better express these ideas: we say the optimal woman for a man is the

    highest woman on his list whom he is paired with in any stablepairing. In other words, the optimal woman

    is the best that a man can do under the condition of stability. In the above example, woman D is the optimal

    woman for man 2. So the best that each man can hope for is to be paired with his optimal woman. But can

    they achieve this optimality simultaneously? I.e., is there a stable pairing such that each man is paired with

    his optimal woman? If such a pairing exists, we will call it amale optimalpairing. Turning to the example

    above, Sis a male optimal pairing since A is 1s optimal woman, D is 2s optimal woman, C is 3s optimal

    woman, and B is 4s optimal woman. Similarly, we can define a female optimal pairing, which is the pairing

    in which each woman is paired with her optimal man. [Exercise: Check that T is a female optimal pairing.]

    We can also go in the opposite direction and define the pessimalwoman for a man to be the lowest ranked

    woman whom he is ever paired with in some stable pairing. This leads naturally to the notion of a malepessimalpairing can you define it, and also a female pessimal pairing?

    Now, a natural question to ask is: Who is better off in the Stable Marriage Algorithm: men or women?

    Think about this before you read on...

    Theorem: The pairing output by the Stable Marriage algorithm is male optimal.

    Proof: Suppose for the sake of contradiction that the pairing is notmale optimal. Assume the first day when

    a man got rejected by his optimal woman was day k. On this day,Mwas rejected byW (his optimal mate)

    in favor ofM who proposed to her. By definition of optimal woman, there must be exist a stable pairing

    Tin which Mand W are paired together. Suppose Tlooks like this: {. . . , (M,W), . . . , (M,W), . . .}. We

    will argue that(M,W)is a rogue couple in T, thus contradicting stability.

    First, it is clear that W prefersM to M, since she rejected Min favor ofM during the execution of the

    stable marriage algorithm. Moreover, since day kwas the first day when some man got rejected by his

    optimal woman, on dayk M had not yet been rejected by his optimal woman. Since he proposed to W on

    thek-th day, this implies that M likesW at least as much as his optimal woman, and therefore at least as

    much asW. Therefore,(M,W) form a rogue couple in T, and so T is not stable. Contradiction. Thus,our assumption was wrong and the pairing is male optimal.

    CS 70, Fall 2013, Note 2 6

  • 8/13/2019 Discrete Math and Probability Theory

    22/133

    What proof techniques did we use to prove this theorem? Again it is a proof by induction, structured as an

    application of the well-ordering principle. How do we see it as a regular induction proof? This is a bit subtle

    to figure out. See if you can do so before reading on. As a hint, the proof is really showing by induction onk

    the following statement: for every k, no man gets rejected by his optimal woman on the kth day. [Exercise:

    can you complete the induction?]

    So men appear to fare very well by following this algorithm. How about the women? The following theorem

    confirms the sad truth:

    Theorem: If a pairing is male optimal, then it is also female pessimal.

    Proof: Let T = {. . . , (M, W), . . . } be the male optimal pairing (which we know is output by the algorithm).

    Suppose for the sake of contradiction that there exists a stable pairing S = {. . . , (M*, W), . . . , (M, W), . . . }

    such that M* is lower on Ws list than M (i.e., M is not her pessimal man). We will argue that S cannot

    possibly be stable by showing that (M,W) is a rogue couple in S. By assumption, W prefers M to M* since

    M* is lower on her list. And M prefers W to his partner W in S because W is his partner in the male optimal

    pairing T. Contradiction. Therefore, the male optimal pairing is female pessimal.

    All this seems a bit unfair to the women! Are there any lessons to be learned from this? Make the first move!

    Back to the National Residency Matching Program, until recently the algorithm was run with the hospitals

    doing the proposing, and so the pairings produced were hospital optimal. In the nineties, the roles were

    reversed so that the medical students were proposing to the hospitals. More recently, there were other

    improvements made to the algorithm which the N.R.M.P. used. For example, the pairing takes into account

    preferences for married students for positions at the same or nearby hospitals.

    Further reading (optional!)

    Though it was in use 10 years earlier, the propose-and-reject algorithm was first properly analyzed by Galeand Shapley, in a famous paper dating back to 1962 that still stands as one of the great achievements in the

    analysis of algorithms. The full reference is:

    D. Gale and L.S. Shapley, College Admissions and the Stability of Marriage, American Mathematical

    Monthly69 (1962), pp. 914.

    Stable marriage and its numerous variants remains an active topic of research in computer science. Although

    it is by now twenty years old, the following very readable book covers many of the interesting developments

    since Gale and Shapleys algorithm:

    D. Gusfield and R.W. Irving, The Stable Marriage Problem: Structure and Algorithms, MIT Press, 1989.

    CS 70, Fall 2013, Note 2 7

  • 8/13/2019 Discrete Math and Probability Theory

    23/133

    CS 70 Discrete Mathematics and Probability Theory

    Fall 2013 Vazirani Note 3

    Modular ArithmeticIn several settings, such as error-correcting codes and cryptography, we sometimes wish to work over a

    smaller range of numbers. Modular arithmetic is useful in these settings, since it limits numbers to a prede-

    fined range{0,1, . . . ,N 1}, and wraps around whenever you try to leave this range like the hand of aclock (whereN= 12) or the days of the week (where N= 7).

    Example: Calculating the time When you calculate the time, you automatically use modular arithmetic.

    For example, if you are asked what time it will be 13 hours from 1 pm, you say 2 am rather than 14.

    Lets assume our clock displays 12 as 0. This is limiting numbers to a predefined range,{0,1,2, . . . ,11}.Whenever you add two numbers in this setting, you divide by 12 and provide the remainder as the answer.

    If we wanted to know what the time would be 24 hours from 2 pm, the answer is easy. It would be 2 pm.

    This is true not just for 24 hours, but for any multiple of 12 hours. What about 25 hours from 2 pm? Since

    the time 24 hours from 2 pm is still 2 pm, 25 hours later it would be 3 pm. Another way to say this is that

    we add 1 hour, which is the remainder when we divide 25 by 12.

    This example shows that under certain circumstances it makes sense to do arithmetic within the confines

    of a particular number (12 in this example). That is, we only keep track of the remainder when we divide

    by 12, and when we need to add two numbers, instead we just add the remainders. This method is quite

    efficient in the sense of keeping intermediate values as small as possible, and we shall see in later notes how

    useful it can be.

    More generally we can define x mod m (in words x modulom) to be the remainderrwhen we dividex by

    m. i.e. ifx mod m=r, thenx =mq + rwhere 0r m1 andq is an integer. Thus 5=29 mod 12 and3=13 mod 5.

    ComputationIf we wish to calculatex +ymod m, we would first addx +yand the calculate the remainder when we dividethe result bym. For example, ifx=14 and y=25 andm=12, we would compute the remainder when wedividex +y=14 + 25=39 by 12, to get the answer 3. Notice that we would get the same answer if we firstcomputed 2= x mod 12 and 1=y mod 12 and added the results modulo 12 to get 3. The same holds forsubtraction: xymod 12 is11 mod 12, which is 1. Again, we could have directly obtained this as 2 1by first simplifyingx mod 12 andy mod 12.

    This is even more convenient if we are trying to multiply: to compute xymod 12, we could first compute

    xy=1425=350 and then compute the remainder when we divide by 12, which is 2. Notice that we getthe same answer if we first compute 2 =xmod 12 and 1 =ymod 12 and simply multiply the results modulo12.

    More generally, while carrying out any sequence of additions, subtractions or multiplications modm, we

    get the same answer even if we reduce any intermediate results mod m. This can considerably simplify the

    CS 70, Fall 2013, Note 3 1

  • 8/13/2019 Discrete Math and Probability Theory

    24/133

    calculations.

    Set Representation

    There is an alternate view of modular arithmetic which helps understand all this better. For any integerm

    we say thatx and y are congruent modulo m if they differ by a multiple ofm, or in symbols,

    x=y mod m mdivides(xy).

    Note that you may also see this written as x y mod m. For example, 29 and 5 are congruent modulo 12because 12 divides 295. We can also write 22=2 mod 12. Notice thatx andy are congruent modulomiff they have the same remainder modulo m.

    What is the set of numbers that are congruent to 0 mod 12? These are all the multiples of 12:

    {. . . ,36,24,12,0,12,24,36, . . .}. What about the set of numbers that are congruent to 1 mod 12?These are all the numbers that give a remainder 1 when divided by 12:{. . . ,35,23,11,1,13,25,37, . . .}.Similarly the set of numbers congruent to 2 mod 12 is{. . . ,34,22,10,2,14,26,38, . . .}. Notice in thisway we get 12 such sets of integers, and every integer belongs to one and only one of these sets.

    In general if we work modulom, then we getm such disjoint sets whose union is the set of all integers. We

    can think of each set as represented by the unique element it contains in the range (0, . . . ,m 1). The setrepresented by elementi would be all numbersz such that z=mx + ifor some integerx. Observe that all ofthese numbers have remainder i when divided bym; they are therefore congruent modulo m.

    We can understand the operations of addition, subtraction and multiplication in terms of these sets. When

    we add two numbers, sayx = 2 mod 12 andy = 1 mod 12, it does not matter whichxandywe pick from thetwo sets, since the result is always an element of the set that contains 3. The same is true about subtraction

    and multiplication. It should now be clear that the elements of each set are interchangeable when computing

    modulom, and this is why we can reduce any intermediate results modulom.

    Here is a more formal way of stating this observation:

    Theorem 3.1: Ifa=c mod m and b=dmod m, thena + b=c + dmod m and a b=c dmodm.

    Proof: We know that c =a + km and d= b + m, so c + d= a + km + b + m=a + b + (k+ ) m,which means thata + b=c + dmod m. The proof for multiplication is similar and left as an exercise.

    What this theorem tells us is that we can always reduce any arithmetic expression modulo m into a natural

    number smaller than m. As an example, consider the expresion (13 + 11) 18 mod 7. Using the aboveTheorem several times we can write:

    (13 + 11) 18= (6 + 4) 4 mod 7

    =10 4 mod 7

    =3 4 mod 7

    =12 mod 7

    =5 mod 7.

    In summary, we can always do calculations modulom by reducing intermediate results modulom.

    InversesWe have so far discussed addition, multiplication and subtraction. What about division? This is a bit harder.

    Over the reals dividing by a number x is the same as multiplying by y = 1/x. Here y is that number such

    CS 70, Fall 2013, Note 3 2

  • 8/13/2019 Discrete Math and Probability Theory

    25/133

    thatx y=1. Of course we have to be careful whenx=0, since such ay does not exist. Similarly, when wewish to divide byx mod m, we need to findy mod m such that x y=1 mod m; then dividing by x modulomwill be the same as multiplying by y modulom. Such ay is called themultiplicative inverseofx modulo

    m. In our present setting of modular arithmetic, can we be sure that x has an inverse mod m, and if so, is it

    unique (modulom) and can we compute it?

    As a first example, take x =8 and m =15. Then 2x=16=1 mod 15, so 2 is a multiplicative inverse of 8

    mod 15. As a second example, takex =12 and m=15. Then the sequence{axmod m : a=0,1,2, . . .}isperiodic, and takes on the values{0,12,9,6,3}(check this!). Thus 12has no multiplicative inverse mod15.

    So when does x have a multiplicative inverse modulo m? The answer is: iff the greatest common divisor

    ofm and x is 1. Moreover, when the inverse exists it is unique. Recall that the greatest common divisorof

    two natural numbers x and y, denoted gcd(x,y), is the largest natural number that divides them both. Forexample,gcd(30,24) =6. If gcd(x,y)is 1, it means that x and y share no common factors (except 1). Thisis often expressed by saying thatx and m arerelatively prime.

    Theorem 3.2: Let m,x be positive integers such that gcd(m,x) = 1. Then x has a multiplicative inversemodulom, and it is unique (modulo m).

    Proof: Consider the sequence ofm numbers 0,x,2x, . . . (m1)x. We claim that these are all distinct mod-

    ulo m. Since there are onlym distinct values modulo m, it must then be the case that ax= 1 mod m forexactly onea (modulom). Thisa is the unique multiplicative inverse.

    To verify the above claim, suppose thatax=bx mod m for two distinct values a,bin the range 0bam1. Then we would have(ab)x= 0 modm, or equivalently,(ab)x= kmfor some integerk(possiblyzero or negative).

    However,x and mare relatively prime, so x cannot share any factors withm. This implies that abmust bean integer multiple ofm. This is not possible, since abranges between 1 andm1.

    Actually it turns out that gcd(m,x) = 1 is also a necessarycondition for the existence of an inverse: i.e., ifgcd(m,x)> 1 then x has no multiplicative inverse modulo m. You might like to try to prove this using asimilar idea to that in the above proof.

    Since we know that multiplicative inverses are unique when gcd(m,x) =1, we shall write the inverse ofx asx1 modm. Being able to compute the multiplicative inverse of a number is crucial to many applications,

    so ideally the algorithm used should be efficient. It turns out that we can use an extended version of Euclids

    algorithm, which computes the gcd of two numbers, to compute the multiplicative inverse.

    Computing the Multiplicative InverseLet us first discuss how computing the multiplicative inverse ofx modulom is related to finding gcd(x,m).For any pair of numbers x,y, suppose we could not only compute gcd(x,y), but also find integers a,b suchthat

    d=gcd(x,y) =ax + by. (1)

    (Note that this is not a modular equation; and the integers a,bcould be zero or negative.) For example, wecan write 1=gcd(35,12) =1 35 + 3 12, so herea=1 andb=3 are possible values fora,b.

    If we could do this then wed be able to compute inverses, as follows. We first find the integers a andb such

    that

    1=gcd(m,x) =am + bx.

    CS 70, Fall 2013, Note 3 3

  • 8/13/2019 Discrete Math and Probability Theory

    26/133

    But this means that bx =1 modm, sob is the multiplicative inverse ofx modulom. Reducingb modulomgives us the unique inverse we are looking for. In the above example, we see that 3 is the multiplicative

    inverse of 12 mod 35. So, we have reduced the problem of computing inverses to that of finding integers

    a,bthat satisfy equation (1). Remarkably, Euclids algorithm for computing gcds also allows us to find theintegers a and b described above. So computing the multiplicative inverse ofx modulo m is as simple as

    running Euclids gcd algorithm on inputx and m!

    Euclids Algorithm

    If we wish to compute the gcd of two numbersxandy, how would we proceed? Ifxoryis 0, then computing

    the gcd is easy; it is simply the other number, since 0 is divisible by everything (although of course it divides

    nothing). The algorithm for computinggcd(x,y)uses the following theorem to eventually reduce to the casewhere one of the numbers is 0:

    Theorem 3.3: Letxy and let q,rbe natural numbers suchx =yq+rand r0.

    Lets go through a quick example of this recursive implementation of Euclids algorithm. We wish to

    compute gcd(32,10):

    gcd(32,10) = gcd(10,2)

    = gcd(2,0)

    = 2

    CS 70, Fall 2013, Note 3 4

  • 8/13/2019 Discrete Math and Probability Theory

    27/133

    Extended Euclids Algorithm

    In order to compute the multiplicative inverse, we need an algorithm which also returns integers a and b

    such that:

    gcd(x,y) =ax + by.

    Now since this problem is a generalization of the basic gcd, it is perhaps not too surprising that we can solveit with a fairly straightforward extension of Euclids algorithm.

    Examples

    Lets first see how we would compute such numbers for x=6 andy=4. Well need the equations from ourexample above, copied here for reference:

    16=101 + 610=61 + 46=41 + 24=22 + 0

    From the last two equations it follows that gcd(6,4) =2. But now the second last equation gives us thenumbersa,b, since we just rearrange that equation to say 2=6141. Soa=1 andb=1.

    What if we started with x =10 and y =6? Now we would write the last three equations to determine thatgcd(10,6) = 2. But how do we finda,b? Start as above and write 2 = 6141. But we want 10 and 6 onthe right hand side, not 6 and 4. But notice that the third from the last equation allows us to write 4 as a linear

    combination of 6 and 10 and so we can just back substitute: we rewrite that equation as 4 =10161and substitute to get:

    2=6141=61 (10161) =62101.

    If we started with x= 16 and y= 10 we would back substitute again using the first equation rewritten as

    6=1610 to get:2=62101= (1610)210=162103. Soa=2 andb=3.

    Algorithm

    The following recursive algorithmextended-gcdimplements the idea used in the examples above. It takes as

    input a pair of natural numbers x y as in Euclids algorithm, and returns a triple of integers (d,a,b)suchthatd=gcd(x,y)and d=ax + by:

    algorithm extended-gcd(x,y)

    if y = 0 then return(x, 1, 0)

    else

    (d, a, b) := extended-gcd(y, x mod y)

    return((d, b, a - (x div y) * b))

    Note that this algorithm has the same form as the basic gcd algorithm we saw earlier; the only difference is

    that we now carry around in addition the required values a,b. You should hand-turn the algorithm on theinput(x,y) = (16,10)from our earlier example, and check that it delivers correct values for a,b.

    CS 70, Fall 2013, Note 3 5

  • 8/13/2019 Discrete Math and Probability Theory

    28/133

    Youll see a full analysis of this algorithm in CS 170, including correctness and efficiency (the running

    time is O(n3)) . Let us understand intuitively why the numbersa and b returned by the algorithm shouldgive us what we are looking for. We just need to generalize the back substitution method we used in the

    example above. The algorithm reduces finding gcd(x,y) to finding gcd(y,xmod y). Once the algorithmfinds gcd(y,xmod y), it returns valuesa and b such that:

    d=ay + b(xmod y). (2)

    Now we need to update these values ofa and b, say toA and B, such that

    d=Ax +By. (3)

    To figure out whatA and B should be, we need to rearrange equation (2), as follows:

    d= ay + b(xmod y)

    =ay + b(xx/yy)

    =bx + (ax/yb)y.

    (In the second line here, we have used the fact that x mod y =x x/yy check this!) Comparing thislast equation with equation (3), we see that we need to takeA=b and B=ax/yb. This is exactly whatthe algorithm does. Of course we have not fully proved correctness, but you should be able to see why the

    algorithm works.

    CS 70, Fall 2013, Note 3 6

  • 8/13/2019 Discrete Math and Probability Theory

    29/133

    CS 70 Discrete Mathematics and Probability Theory

    Fall 2013 Vazirani Note 4

    This note is partly based on Section 1.4 of Algorithms," by S. Dasgupta, C. Papadimitriou and U. Vazirani,McGraw-Hill, 2007.

    An online draft of the book is available at http://www.cs.berkeley.edu/ vazirani/algorithms.html

    Public Key Cryptography

    Bijections

    This note introduces the fundamental concept of a function, as well as a famous function called RSA that

    forms the basis of public-key cryptography. A function is a mapping from a set of inputs Ato a set of outputs

    B: for inputxA, f(x)must be in the setB. To denote such a function, we write f :AB.

    Consider the following examples of functions, where both functions map{0, . . . , m 1}to itself:

    f(x) =x + 1 modm

    g(x) =2xmod m

    A bijection is a function for which every bB has a uniquepre-image aA such that f(a) =b. Note thatthis consists of two conditions:

    1. f isonto: everybB has a pre-image aA.

    2. f isone-to-one: for alla, a A, if f(a) = f(a)thena=a.

    Looking back at our examples, we can see that fis a bijection; the unique pre-image ofy isy 1. However,gis only a bijection ifm is odd. Otherwise, it is neither one-to-one nor onto. The following lemma can be

    used to prove that a function is a bijection:

    Lemma:For a finite setA, f :A Ais a bijection if there is aninversefunctiong:A Asuch thatx Ag(f(x)) =x.

    Proof: If f(x) = f(x), thenx=g(f(x)) =g(f(x)) =x. Therefore, fis one-to-one. Since f is one-to-one,there must be|A|elements in the range of f. This implies that fis also onto.

    RSA

    One of the most useful bijections is the RSA function, named after its inventors Ronald Rivest, Adi Shamir

    and Leonard Adleman:

    E(x)xe modN

    whereN= pq(pand q are two large primes),E: {0, . . . ,N 1} {0, . . . ,N 1}and e is relatively primeto(p 1)(q 1). The inverse of the RSA function is:

    CS 70, Fall 2013, Note 4 1

  • 8/13/2019 Discrete Math and Probability Theory

    30/133

    D(x)xd modN

    wheredis the inverse ofe mod (p 1)(q 1).

    Consider the following setting. Alice and Bob wish to communicate confidentially over some (insecure)

    link. Eve, an eavesdropper who is listening in, would like to discover what they are saying. Lets assume

    that Alice wants to transmit a message x (a number between 1 and N 1) to Bob. She will apply herencryption function E to x and send the encrypted message E(x)(also called the cyphertext) over the link;Bob, upon receipt ofE(x), will then apply hisdecryption function D to it. Since D is the inverse ofE, Bobwill obtain the original messagex. However, how can we be sure that Eve cannot also obtainx?

    In order to encrypt the message, Alice only needs Bobspublic key(N, e). In order to decrypt the message,Bob needs hisprivate key d. The pair(N, e)can be thought of as a public lock - anyone can place a messagein a box and lock it, but only Bob has the key dto open the lock. The idea is that since Eve does not have

    access tod, she will not be able to gain information about Alices message.

    We will now prove that D(E(x)) =x(and thereforeE(x)is a bijection). We will require a beautiful theoremfrom number theory known as Fermats Little Theorem, which is the following:

    Theorem 4.1: [Fermats Little Theorem]For any prime pand any a {1, 2, . . . ,p 1}, we haveap1 =1 mod p.

    Let Sbe the nonzero integers modulo p; that is, S= {1, 2, . . . ,p 1}. Define a function f :S Ssuchthat f(x)ax mod p. Heres the crucial observation: fis simply a bijection from Sto S; it permutes theelements ofS. For instance, heres a picture of the case a=3,p=7:

    6

    5

    4

    3

    2

    1 1

    2

    3

    4

    5

    6

    Figure 1: Multiplication by (3 mod 7)

    With this intuition, we can now prove Fermats Little Theorem:

    Proof: Our first claim is that f(x)is a bijection. We will then show that this claim implies the theorem.

    To show that f is a bijection, we simply need to argue that the numbers a i mod p are distinct. This isbecause if a i a j (mod p), then dividing both sides by a gives i j (mod p). They are nonzerobecausea i0 similarly implies i0. (And wecan divide bya, because by assumption it is nonzero andtherefore relatively prime to p.)

    Now we can prove the theorem. Since f is a bijection, we know that the image of f isS. Now if we take the

    product of all elements inS, it is equal to the product of all elements in the image of f:

    (p 1)!ap1 (p 1)! (mod p).

    CS 70, Fall 2013, Note 4 2

  • 8/13/2019 Discrete Math and Probability Theory

    31/133

    Dividing by(p 1)! (which we can do because it is relatively prime to p, since p is assumed prime) thengives the theorem.

    Let us return to proving that D(E(x)) =x:

    Theorem 4.2: Under the above definitions of the encryption and decryption functionsE and D, we have

    D(E(x)) =x mod Nfor every possible messagex {0,

    1, . . . ,

    N 1}.The proof of this theorem relies on Fermats Little Theorem:

    Proof of Theorem 6.2:To prove the statement, we have to show that

    (xe)d =x mod N for everyx {0, 1, . . . ,N 1}. (1)

    Lets consider the exponent, which is ed. By definition ofd, we know thated= 1 mod(p 1)(q 1); hencewe can writeed= 1 + k(p 1)(q 1)for some integer k, and therefore

    xedx=x1+k(p1)(q1) x=x(xk(p1)(q1) 1). (2)

    Looking back at equation (1), our goal is to show that this last expression in equation (2) is equal to 0 modN

    for everyx.Now we claim that the expression x(xk(p1)(q1) 1)in (2) is divisible by p. To see this, we consider twocases:

    Case 1: x is not a multiple of p. In this case, sincex = 0 mod p, we can use Fermats Little Theorem to deducethat xp1 =1 mod p. Then(xp1)k(q1) 1k(q1) mod p and hence xk(p1)(q1) 1= 0 mod p, asrequired.

    Case 2: x is a multiple of p. In this case the expression in (2), which hasx as a factor, is clearly divisible by p.

    By an entirely symmetrical argument, x(xk(p1)(q1) 1)is also divisible by q. Therefore, it is divisible byboth p and q, and since p and q are primes it must be divisible by their product, pq=N. But this implies

    that the expression is equal to 0 modN, which is exactly what we wanted to prove.

    So we have seen that the RSA protocol iscorrect, in the sense that Alice can encrypt messages in such a way

    that Bob can reliably decrypt them again. But how do we know that it is secure, i.e., that Eve cannot get any

    useful information by observing the encrypted messages? The security of RSA hinges upon the following

    simple assumption:

    Given N,e and y=xe modN, there is no efficient algorithm for determiningx.

    This assumption is quite plausible. How might Eve try to guess x? She could experiment with all possible

    values ofx, each time checking whether xe =y mod N; but she would have to try on the order ofNvalues

    of x, which is completely unrealistic if N is a number with (say) 512 bits. This is becauseN2512

    islarger than estimates for the age of the Universe in femtoseconds! Alternatively, she could try to factor Nto

    retrieve pand q, and then figure out dby computing the inverse ofe mod(p 1)(q 1); but this approachrequires Eve to be able to factor Ninto its prime factors, a problem which is believed to be impossible to

    solve efficiently for large values ofN. She could try to compute the quantity(p 1)(q 1)without factoringN; but it is possible to show that computing (p 1)(q 1) is equivalent to factoring N. We should pointout that the security of RSA has not been formally proved: it rests on the assumptions that breaking RSA is

    essentially tantamount to factoring N, and that factoring is hard.

    CS 70, Fall 2013, Note 4 3

  • 8/13/2019 Discrete Math and Probability Theory

    32/133

    We close this note with a brief discussion of implementation issues for RSA. Since we have argued that

    breaking RSA is impossible becausefactoringwould take a very long time, we should check that the com-

    putations that Alice and Bob themselves have to perform are much simpler, and can be done efficiently.

    There are really only two non-trivial things that Alice and Bob have to do:

    1. Bob has to find prime numberspand q, each having many (say, 512) bits.

    2. Both Alice and Bob have to compute exponentials modN. (Alice has to computexe modN, and Bob

    has to computeyd modN.)

    Both of these tasks can be carried out efficiently. The first requires the implementation of an efficient test

    for primality as well as a rich source of primes. You will learn how to tackle each of these tasks in the

    algorithms courseCS170. The second requires an efficient algorithm for modular exponentiation, which is

    not very difficult, but will also be discussed in detail in CS170.

    To summarize, then, in the RSA protocol Bob need only perform simple calculations such as multiplication,

    exponentiation and primality testing to implement his digital lock. Similarly, Alice and Bob need only

    perform simple calculations to lock and unlock the the message respectivelyoperations that any pocket

    computing device could handle. By contrast, to unlock the message without the key, Eve would haveto perform operations like factoring large numbers, which (at least according to widely accepted belief)

    requires more computational power than all the worlds most sophisticated computers combined! This

    compelling guarantee of security without the need for private keys explains why the RSA cryptosystem is

    such a revolutionary development in cryptography.

    CS 70, Fall 2013, Note 4 4

  • 8/13/2019 Discrete Math and Probability Theory

    33/133

    CS 70 Discrete Mathematics and Probability Theory

    Fall 2013 Vazirani Note 5

    PolynomialsPolynomials constitute a rich class of functions which are both easy to describe and widely applicable in

    topics ranging from Fourier analysis to computational geometry. In this note, we will discuss properties of

    polynomials which make them so useful. We will then describe how to take advantage of these properties to

    develop a secret sharing scheme.

    Recall from your high school math that a polynomial in a single variable is of the form p(x) =adxd +

    ad1xd1 +. . .+ a0. Here the variable x and the coefficients ai are usually real numbers. For example,

    p(x) =5x3 + 2x + 1, is a polynomial ofdegree d=3. Its coefficients area3=5,a2=0,a1=2, anda0=1.Polynomials have some remarkably simple, elegant and powerful properties, which we will explore in this

    note.

    First, a definition: we say that a is a root of the polynomial p(x) if p(a) =0. For example, the degree2 polynomial p(x) =x2 4 has two roots, namely 2 and 2, since p(2) = p(2) = 0. If we plot thepolynomial p(x)in thex-yplane, then the roots of the polynomial are just the places where the curve crossesthex axis:

    We now state two fundamental properties of polynomials that we will prove in due course.

    Property 1: A non-zero polynomial of degree dhas at mostdroots.

    Property 2: Givend+ 1 pairs(x1,y1), . . . , (xd+1,yd+1), with all thexidistinct, there is a unique polynomialp(x)of degree (at most)dsuch that p(xi) =yifor 1 i d+ 1.

    Let us consider what these two properties say in the case that d= 1. A graph of a linear (degree 1) polynomialy=a1x + a0is a line. Property 1 says that if a line is not thex-axis (i.e. if the polynomial is noty=0), then

    it can intersect thex-axis in at most one point.

    CS 70, Fall 2013, Note 5 1

  • 8/13/2019 Discrete Math and Probability Theory

    34/133

    Property 2 says that two points uniquely determine a line.

    Polynomial Interpolation

    Property 2 says that two points uniquely determine a degree 1 polynomial (a line), three points uniquely

    determine a degree 2 polynomial, four points uniquely determine a degree 3 polynomial, and so on. Given

    d+ 1 pairs(x1,y1), . . . , (xd+1,yd+1), how do we determine the polynomial p(x) =adxd+ . . . + a1x + a0such

    that p(xi) =yi for i= 1 to d+ 1? We will give an efficient algorithms for reconstructing the coefficientsa0, . . . , ad, and therefore the polynomial p(x).

    The method is called Lagrange interpolation: Let us start by solving an easier problem. Suppose that

    we are told that y1= 1 and yj =0 for 2 j d+ 1. Now can we reconstruct p(x)? Yes, this is easy!

    Considerq(x) = (xx2)(xx3) (xxd+1). This is a polynomial of degree d(thexis are constants, andxappears d times). Also, we clearly haveq(xj) = 0 for 2 j d+ 1. But what isq(x1)? Well,q(x1) =(x1 x2)(x1 x3) (x1 xd+1), which is some constant not equal to 0. Thus if we let p(x) = q(x)/q(x1)(dividing is ok sinceq(x1) = 0), we have the polynomial we are looking for. For example, suppose you weregiven the pairs(1, 1),(2, 0), and(3, 0). Then we can construct the degree d=2 polynomial p(x)by lettingq(x) = (x2)(x3) =x25x + 6, andq(x1) = q(1) = 2. Thus, we can now constructp(x) = q(x)/q(x1) =(x2 5x + 6)/2.

    Of course the problem is no harder if we single out some arbitrary index i instead of 1: i.e. yi=1 andyj=0for j =i. Let us introduce some notation: let us denote by i(x)the degreedpolynomial that goes through

    thesed+ 1 points. Then i(x) = j=i(xxj)j=i(xixj)

    .

    Let us now return to the original problem. Givend+ 1 pairs(x1,y1), . . . , (xd+1,yd+1), we first construct thed+ 1 polynomials 1(x), . . . ,d+1(x). Now we can write p(x) =

    d+1i=1 yii(x). Why does this work? First

    notice that p(x)is a polynomial of degreedas required, since it is the sum of polynomials of degreed. Andwhen it is evaluated atxi,dof thed+ 1 terms in the sum evaluate to 0 and thei-thterm evaluates to yi times1, as required.

    As an example, suppose we want to find the degree-2 polynomial p(x)that passes through the three points

    CS 70, Fall 2013, Note 5 2

  • 8/13/2019 Discrete Math and Probability Theory

    35/133

    (1, 1),(2, 2)and(3, 4). The three polynomials i are as follows: Ifd=2, andxi=i, for instance, then

    1(x) =(x2)(x3)

    (12)(13)=

    (x2)(x3)

    2 =

    1

    2x2

    5

    2x + 3;

    2(x) =(x1)(x3)

    (21)(23)=

    (x1)(x3)

    1 = x2 + 4x3;

    3(x) =

    (x1)(x2)

    (31)(32)=

    (x1)(x2)

    2 =

    1

    2x2

    3

    2x + 1.

    The polynomial p(x)is therefore given by

    p(x) =1 1(x) + 2 2(x) + 4 3(x) =1

    2x2

    1

    2x + 1.

    You should verify that this polynomial does indeed pass through the above three points.

    Proof of Property 2

    We would like to prove property 2:

    Property 2: Givend+ 1 pairs(x1,y1), . . . , (xd+1,yd+1), with all thexidistinct, there is a unique polynomialp(x)of degree (at most)dsuch that p(xi) =yifor 1 i d+ 1.

    We have shown how to find a polynomial p(x)such that p(xi) =yi for d+ 1 pairs(x1,y1), . . . , (xd+1,yd+1).This proves part of property 2 (the existence of the polynomial). How do we prove the second part, that the

    polynomial is unique? Suppose for contradiction that there is another polynomialq(x)such that p(xi) = yifor alld+ 1 pairs above. Now consider the polynomialr(x) = p(x)q(x). Since we are assuming thatq(x)and p(x) are different polynomials, r(x) must be a non-zero polynomial of degree at most d. Therefore,property 1 implies it can have at most d roots. But on the other handr(xi) = p(xi) q(xi) =0 on d+ 1distinct points. Contradiction. Therefore, p(x)is the unique polynomial that satisfies the d+ 1 conditions.

    Polynomial Division

    Lets take a short digression to discuss polynomial division, which will be useful in the proof of property 1.

    If we have a polynomialp(x)of degreed, we can divide by a polynomial q(x)of degree dby using longdivision. The result will be:

    p(x) =q(x)q(x) + r(x)

    whereq(x)is the quotient and r(x)is the remainder. The degree ofr(x)must be smaller than the degree ofp(x).

    Example. We wish to divide p(x) =x3 +x2 1 by q(x) =x1:

    X2 + 2X+ 2

    X1

    X3 +X2 1

    X

    3

    +X2

    2X2

    2X2 + 2X

    2X1

    2X+ 2

    1

    Nowp(x) =x3 +x2 1= (x1)(x2 + 2x + 2) + 1,r(x) =1 andq(x) =x2 + 2x + 2.

    CS 70, Fall 2013, Note 5 3

  • 8/13/2019 Discrete Math and Probability Theory

    36/133

    Proof of Property 1

    Now let us turn to property 1: a non-zero polynomial of degree dhas at most droots.The idea of the proof

    is as follows. We will prove the following claims:

    Claim 1Ifa is a root of a polynomial p(x)with degreed, then p(x) = (xa)q(x)for a polynomialq(x)with degreed1.

    Claim 2A polynomial p(x)of degreedwith distinct roots a1, . . . , adcan be written as p(x) = c(xa1) (xad).

    Claim 2 implies property 1. We must show thata=ai for i= 1, . . . dcannot be a root of p(x). But thisfollows from claim 2, since p(a) =c(aa1) (aad) =0.

    Proof of Claim 1

    Dividingp(x)by(xa)gives p(x) = (xa)q(x)+ r(x), whereq(x)is the quotient andr(x)is the remainder.The degree ofr(x) is necessarily smaller than the degree of the divisor (x a). Thereforer(x) must have

    degree 0 and therefore is some constant c. But now substitutingx= a, we get p(a) = c. But sincea is aroot, p(a) =0. Thusc=0 and therefore p(x) = (xa)q(x), thus showing that(xa)|p(x).

    Claim 1 implies Claim 2

    Proof by induction ond.

    Base Case: We must show that a polynomial p(x) of degree 1 with root a1 can be written as p(x) =c(xa1). By Claim 1, we know that p(x) = (xa1)q(x), whereq(x)has degree 0 and is therefore aconstant.

    Inductive Hypothesis: A polynomial of degreed1 with distinct rootsa1, . . . , ad1can be written asp(x) =c(xa1) (xad1).

    Inductive Step: Let p(x) be a polynomial of of degree d with distinct roots a1, , ad. By Claim1, p(x) = (x ad)q(x) for some polynomial q(x) of degree d 1. Since 0= p(ai) = (ai ad)q(ai)for all i=d and ai ad=0 in this case, q(ai) must be equal to 0. Then q(x) is a polynomial ofdegree d 1 with distinct roots a1, . . . , ad1. We can now apply the inductive assumption to q(x)to write q(x) =c(x a1) (x ad1). Substituting in p(x) = (x ad)q(x), we finally obtain thatp(x) =c(xa1) (xad).

    Finite FieldsBoth property 1 and property 2 also hold when the values of the coefficients and the variable x are chosen

    from the complex numbers instead of the real numbers or even the rational numbers. They do not hold if the

    values are restricted to being natural numbers or integers. Let us try to understand this a little more closely.

    The only properties of numbers that we used in polynomial interpolation and in the proof of property 1 is

    that we can add, subtract, multiply and divide any pair of numbers as long as we are not dividing by 0. We

    cannot subtract two natural numbers and guarantee that the result is a natural number. And dividing two

    integers does not usually result in an integer.

    CS 70, Fall 2013, Note 5 4

  • 8/13/2019 Discrete Math and Probability Theory

    37/133

    But if we work with numbers modulo a prime m, then we can add, subtract, multiply and divide (by any

    non-zero number modulo m). To check this, recall that x has an inverse mod m if gcd(m,x) = 1, so ifm isprimeall the numbers{1, . . . , m1}have an inverse mod m. So both property 1 and property 2 hold if thecoefficients and the variable x are restricted to take on values modulo m. This remarkable fact that these

    properties hold even when we restrict ourselves to a finiteset of values is the key to several applications that

    we will presently see.

    Let us consider an example of degree d= 1 polynomials modulo 5. Let p(x) = 2x + 3( mod5). The rootsof this polynomial are all values x such that 2x + 3=0( mod5)holds. Solving forx, we get that 2x= 3=2( mod5)or x=1( mod5). Note that this is consistent with property 1 since we got only 1 root of a degree1 polynomial.

    Now consider the polynomials p(x) = 2x + 3 and q(x) = 3x 2 with all numbers reduced mod 5. We canplot the value of each polynomialyas a function ofxin thex-yplane. Since we are working modulo 5, there

    are only 5 possible choices forx, and only 5 possible choices fory:

    Notice that these two lines" intersect in exactly one point, even though the picture looks nothing at all like

    lines in the Euclidean plane! Looking at these graphs it might seem remarkable that both property 1 and

    property 2 hold when we work modulo m for any prime number m. But as we stated above, all that was

    required for the proofs of property 1 and 2 was the ability to add, subtract, multiply and divide any pair of

    numbers (as long as we are not dividing by 0), and they hold whenever we work modulo a primem.

    When we work with numbers modulo a prime m, we say that we are working over a finite field, denoted

    by Fm or GF(m) (for Galois Field). In order for a set to be called a field, it must satisfy certain axiomswhich are the building blocks that allow for these amazing properties and others to hold. If you would like

    to learn more about fields and the axioms they satisfy, you can visit Wikipedias site and read the article

    on fields: http://en.wikipedia.org/wiki/Field_\%28mathematics\%29. While you are

    there, you can also read the article on Galois Fields and learn more about some of their applications and

    elegant properties which will not be covered in this lecture: http://en.wikipedia.org/wiki/

    Galois_field.

    CountingHow many polynomials of degree (at most) 2 are there modulo m? This is easy: there are 3 coefficients,

    each of which can take on one ofm values for a total ofm3

    . Writing p(x) = adxd

    + ad1xd1

    + . . . + a0 byspecifying itsd+ 1 coefficientsai is known as the coefficient representation ofp(x). Is there any other wayto specify p(x)?

    Sure, there is! Our polynomial of degree (at most) 2 is uniquely specified by its values at any three points, say

    x=0, 1, 2. Once again each of these three values can take on one ofm values, for a total ofm3 possibilities.In general, we can specify a degree dpolynomial p(x)by specifying its values atd+ 1 points, say 0, 1, . . . , d.Thesed+ 1 values,(y0,y1, . . . ,yd)are called the value representation ofp(x). The coefficient representation

    CS 70, Fall 2013, Note 5 5

  • 8/13/2019 Discrete Math and Probability Theory

    38/133

  • 8/13/2019 Discrete Math and Probability Theory

    39/133

    over such a devastating and destructive weapon. Suppose the U.S. government finally decides that a nuclear

    strike can be initiated only if at leastk>1 major officials agree to it. We want to devise a scheme such that(1) any group ofkof these officials can pool their information to figure out the launch code and initiate the

    strike but (2) no group ofk1 or fewer have any information about the launch code, even if they pool their

    knowledge. For example, they should not learn whether the secret is odd or even, a prime number, divisible

    by some numbera, or the secrets least significant bit. How can we accomplish this?

    Suppose that there aren officials indexed from 1 to n and the launch code is some natural number s. Letqbe a prime number larger than n and s. We will work overGF(q)from now on.

    Now pick a random polynomial P(x) of degree k 1 such that P(0) = s and give P(1) to the first official,P(2)to the second,. . . ,P(n)to thenth . Then

    Anykofficials, having the values of the polynomial at kpoints, can use Lagrange interpolation to find

    P, and once they know what P is, they can compute P(0) =s to learn the secret. Another way to saythis is that anykofficials have between them a value representation of the polynomial, which they can

    convert to the coefficient representation, which allows them to evaluateP(0) =s.

    Any group of k 1 officials has no information about s. So they know only k 1 points through

    which P(x), an unknown polynomial of degree k1 passes. They wish to reconstruct P(0). But byour discussion in the previous section, for each possible valueP(0) =b, there is a unique polynomialof degree k 1 that passes through the k 1 points of the k 1 officials as well as through (0, b).Hence the secret could be any of the q possible values {0, 1, . . . , q 1}, so the officials havein avery precise senseno information about s. Another way of saying this is that the information of the

    officials is consistent withq different value representations, one for each possible value of the secret,

    and thus the officials have no information about s. (Note that this is the main reason we choose to

    work over finite fields rather than, say, over the real numbers, where the basic secret-sharing scheme

    would still work. Because there are only finitely many values in our field, we can quantify precisely

    how many remaining possibilities there are for the value of the secret, and show that this is the same

    as if the officials had no information at all.)

    Example. Suppose you are in charge of setting up a secret sharing scheme, with secret s= 1, where youwant to distributen=5 shares to 5 people such that any k=3 or more people can figure out the secret, buttwo or fewer cannot. Lets say we are working over GF(7)(note that 7>s and 7>n) and you randomlychoose the following polynomial of degree k1=2 : P(x) =3x2 + 5x + 1 (here,P(0) =1=s, the secret).So you know everything there is to know about the secret and the polynomial, but what about the people

    that receive the shares? Well, the shares handed out are P(1) =2 to the first official,P(2) =2 to the second,P(3) =1 to the third, P(4) = 6 to the fourth, and P(5) = 3 to the fifth official. Lets say that officials 3,4, and 5 get together (we expect them to be able to recover the secret). Using Lagrange interpolation, they

    compute the following delta functions:

    3(x) =(x4)(x5)

    (34)(35)

    =(x4)(x5)

    2

    =4(x4)(x5);

    4(x) =(x3)(x5)

    (43)(45)=

    (x3)(x5)

    1 =6(x3)(x5);

    5(x) =(x3)(x4)

    (53)(54)=

    (x3)(x4)

    2 =4(x3)(x4).

    They then compute the polynomial over GF(7): P(x) = (1)3(x) + (6)4(x) + (3)5(x) =3x2 + 5x + 1

    (verify the computation!). Now they simply compute P(0)and discover that the secret is 1.

    CS 70, Fall 2013, Note 5 7

  • 8/13/2019 Discrete Math and Probability Theory

    40/133

    Lets see what happens if two officials try to get together, say persons 1 and 5. They both know that the

    polynomial looks likeP(x) =a2x2 + a1x + s. They also know the following equations:

    P(1) =a2+ a1+ s=2

    P(5) =4a2+ 5a1+ s=3

    But that is all they have two equations with three unknowns, and thus they cannot find out the secret. This

    is the case no matter which two officials get together. Notice that since we are working overGF(7), thetwo people could have guessed the secret (0 s 6) and constructed a unique degree 2 polynomial (by

    property 2). But the two people combined have the same chance of guessing what the secret is as they do

    individually. This is important, as it implies that two people have no more information about the secret than

    one person does.

    CS 70, Fall 2013, Note 5 8

  • 8/13/2019 Discrete Math and Probability Theory

    41/133

    CS 70 Discrete Mathematics and Probability Theory

    Fall 2013 Vazirani Note 6

    Error Correcting CodesWe will consider two situations in which we wish to transmit information on an unreliable channel. The

    first is exemplified by the internet, where the information (say a file) is broken up into packets, and the un-

    reliability is manifest in the fact that some of the packets are lost (or erased) during transmission. Moreover

    the packets are labeled so that the recipient knows exactly which packets were received and which were

    dropped. We will refer to such errors as erasure errors. See the figure below:

    In the second situation, some of the packets are corrupted during transmission due to channel noise. Nowthe recipient has no idea which packets were corrupted and which were received unmodified:

    In the above example, packets 1 and 4 are corrupted. These types of errors are called general errors. We will

    discuss methods of encoding messages, called error correcting codes, which are capable of correcting both

    erasure and general errors.

    Assume that the information consists ofn packets. We can assume without loss of generality that the contents

    of each packet is a number moduloq (denoted byGF(q)), whereq is a prime. For example, the contents ofthe packet might be a 32-bit string and can therefore be regarded as a number between 0 and 2 32 1; then

    we could choose q to be any prime larger than 2 32. The properties of polynomials overGF(q) (i.e., withcoefficients and values reduced modulo q) are the backbone of both error-correcting schemes. To see this,

    let us denote the message to be sent by m1, . . . ,mnand make the following crucial observations:

    1)There is a unique polynomial P(x)of degreen1 such thatP(i) = mi for 1 i n (i.e., P(x)containsall of the information about the message, and evaluating P(i)gives the contents of thei-th packet).

    2)The message to be sent is nowm1=P(1), . . . ,mn=P(n). We can generate additional packets by evaluat-ingP(x)at additional pointsn + 1,n+ 2, . . . ,n+j (remember, our transmitted message must be redundant,i.e., it must contain more packets than the original message to account for the lost or corrupted packets).

    Thus the transmitted message is c1= P(1),c2= P(2), . . . ,cn+j= P(n+ j). Since we are working moduloq, we must make sure that n + j q, but this condition does not impose a serious constraint since q is very

    large.

    Erasure Errors

    Here we consider the setting of packets being transmitted over the internet. In this setting, the packets are

    labeled and so the recipient knows exactly which packets were dropped during transmission. One additional

    observation will be useful:

    CS 70, Fall 2013, Note 6 1

  • 8/13/2019 Discrete Math and Probability Theory

    42/133

    3)By Property 2 in Note 7, we can uniquely reconstruct P(x)from its values at anyn distinct points, sinceit has degree n 1. This means that P(x) can be reconstructed from any n of the transmitted packets.Evaluating this reconstructed polynomial P(x)at x=1, . . . ,nyields the original messagem1, . . . ,mn.

    Recall that in our scheme, the transmitted message isc1=P(1),c2=P(2), . . . ,cn+j=P(n+j). Thus, if wehope to be able to correct kerrors, we simply need to set j= k. The encoded message will then consist ofn+ kpackets.

    Example

    Suppose Alice wants to send Bob a message ofn= 4 packets and she wants to guard against k= 2 lostpackets. Then, assuming the packets can be coded up as integers between 0 and 6, Alice can work over