59
Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 1 of 9

Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Maximum Likelihood Estimation

INFO-2301: Quantitative Reasoning 2Michael Paul and Jordan Boyd-GraberMARCH 7, 2017

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 1 of 9

Page 2: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Why MLE?

• Before: Distribution + Parameter→ x

• Now: x + Distribution→ Parameter

• (Much more realistic)

• But: Says nothing about how good a fit a distribution is

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 9

Page 3: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Likelihood

• Likelihood is p(x;θ )

• We want estimate of θ that best explains data we seen

• I.e., Maximum Likelihood Estimate (MLE)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 9

Page 4: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Likelihood

• The likelihood function refers to the PMF (discrete) or PDF (continuous).

• For discrete distributions, the likelihood of x is P(X = x).

• For continuous distributions, the likelihood of x is the density f (x).

• We will often refer to likelihood rather than probability/mass/density sothat the term applies to either scenario.

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 9

Page 5: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Optimizing Unconstrained Functions

Suppose we wanted to optimize

`= x2−2x +2 (1)

-5 -4 -3 -2 -1 0 1 2 3 4 5

-3

-2

-1

1

2

3

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 5 of 9

Page 6: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Optimizing Unconstrained Functions

Suppose we wanted to optimize

`= x2−2x +2 (1)∂ `

∂ x=−2x −2 (2)

-5 -4 -3 -2 -1 0 1 2 3 4 5

-3

-2

-1

1

2

3

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 5 of 9

Page 7: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Optimizing Unconstrained Functions

∂ `

∂ x=0 (3)

−2x −2=0 (4)

x =−1 (5)

(Should also check that second derivative is negative)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 6 of 9

Page 8: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Optimizing Constrained Functions

Theorem: Lagrange Multiplier Method

Given functions f (x1, . . .xn) and g(x1, . . .xn), the critical points of frestricted to the set g = 0 are solutions to equations:

∂ f

∂ xi(x1, . . .xn) =λ

∂ g

∂ xi(x1, . . .xn) ∀i

g(x1, . . .xn) = 0

This is n+1 equations in the n+1 variables x1, . . .xn,λ.

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 7 of 9

Page 9: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Lagrange Example

Maximize `(x ,y) =p

xy subject to the constraint 20x +10y = 200.

• Compute derivatives

∂ `

∂ x=

1

2

s

y

x

∂ g

∂ x= 20

∂ `

∂ y=

1

2

√x

y

∂ g

∂ y= 10

• Create new systems of equations

1

2

s

y

x= 20λ

1

2

√x

y= 10λ

20x +10y = 200

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 8 of 9

Page 10: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Lagrange Example

Maximize `(x ,y) =p

xy subject to the constraint 20x +10y = 200.

• Compute derivatives

∂ `

∂ x=

1

2

s

y

x

∂ g

∂ x= 20

∂ `

∂ y=

1

2

√x

y

∂ g

∂ y= 10

• Create new systems of equations

1

2

s

y

x= 20λ

1

2

√x

y= 10λ

20x +10y = 200

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 8 of 9

Page 11: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Lagrange Example

Maximize `(x ,y) =p

xy subject to the constraint 20x +10y = 200.

• Compute derivatives

∂ `

∂ x=

1

2

s

y

x

∂ g

∂ x= 20

∂ `

∂ y=

1

2

√x

y

∂ g

∂ y= 10

• Create new systems of equations

1

2

s

y

x= 20λ

1

2

√x

y= 10λ

20x +10y = 200

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 8 of 9

Page 12: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Lagrange Example

Maximize `(x ,y) =p

xy subject to the constraint 20x +10y = 200.

• Compute derivatives

∂ `

∂ x=

1

2

s

y

x

∂ g

∂ x= 20

∂ `

∂ y=

1

2

√x

y

∂ g

∂ y= 10

• Create new systems of equations

1

2

s

y

x= 20λ

1

2

√x

y= 10λ

20x +10y = 200

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 8 of 9

Page 13: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Lagrange Example

• Dividing the first equation by the second gives us

y

x= 2 (6)

• which means y = 2x , plugging this into the constraint equation gives:

20x +10(2x) = 200

x = 5⇒ y = 10

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 9 of 9

Page 14: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Maximum Likelihood Estimation

INFO-2301: Quantitative Reasoning 2Michael Paul and Jordan Boyd-GraberMARCH 7, 2017

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 1 of 4

Page 15: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Continuous Distribution: Gaussian

• Recall the density function

f (x) =1

p2πσ2

exp

−(x −µ)2

2σ2

(1)

• Taking the log makes math easier, doesn’t change answer (monotonic)

• If we observe x1 . . .xN , then log likelihood is

`(µ,σ)≡ −N logσ−N

2log(2π)−

1

2σ2

i

(xi −µ)2 (2)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 4

Page 16: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Continuous Distribution: Gaussian

• Recall the density function

f (x) =1

p2πσ2

exp

−(x −µ)2

2σ2

(1)

• Taking the log makes math easier, doesn’t change answer (monotonic)

• If we observe x1 . . .xN , then log likelihood is

`(µ,σ)≡ −N logσ−N

2log(2π)−

1

2σ2

i

(xi −µ)2 (2)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 4

Page 17: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Continuous Distribution: Gaussian

• Recall the density function

f (x) =1

p2πσ2

exp

−(x −µ)2

2σ2

(1)

• Taking the log makes math easier, doesn’t change answer (monotonic)

• If we observe x1 . . .xN , then log likelihood is

`(µ,σ)≡ −N logσ−N

2log(2π)−

1

2σ2

i

(xi −µ)2 (2)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 4

Page 18: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Continuous Distribution: Gaussian

• Recall the density function

f (x) =1

p2πσ2

exp

−(x −µ)2

2σ2

(1)

• Taking the log makes math easier, doesn’t change answer (monotonic)

• If we observe x1 . . .xN , then log likelihood is

`(µ,σ)≡ −N logσ−N

2log(2π)−

1

2σ2

i

(xi −µ)2 (2)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 4

Page 19: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Continuous Distribution: Gaussian

• Recall the density function

f (x) =1

p2πσ2

exp

−(x −µ)2

2σ2

(1)

• Taking the log makes math easier, doesn’t change answer (monotonic)

• If we observe x1 . . .xN , then log likelihood is

`(µ,σ)≡ −N logσ−N

2log(2π)−

1

2σ2

i

(xi −µ)2 (2)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 4

Page 20: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

MLE of Gaussian µ

`(µ,σ) =−N logσ−N

2log(2π)−

1

2σ2

i

(xi −µ)2 (3)

∂ `

∂ µ=0+

1

σ2

i

(xi −µ) (4)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 4

Page 21: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

MLE of Gaussian µ

`(µ,σ) =−N logσ−N

2log(2π)−

1

2σ2

i

(xi −µ)2 (3)

∂ `

∂ µ=0+

1

σ2

i

(xi −µ) (4)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 4

Page 22: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

MLE of Gaussian µ

`(µ,σ) =−N logσ−N

2log(2π)−

1

2σ2

i

(xi −µ)2 (3)

∂ `

∂ µ=0+

1

σ2

i

(xi −µ) (4)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 4

Page 23: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

MLE of Gaussian µ

`(µ,σ) =−N logσ−N

2log(2π)−

1

2σ2

i

(xi −µ)2 (3)

∂ `

∂ µ=0+

1

σ2

i

(xi −µ) (4)

Solve for µ:

0=1

σ2

i

(xi −µ) (5)

0=∑

i

xi −Nµ (6)

µ=

i xi

N(7)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 4

Page 24: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

MLE of Gaussian µ

`(µ,σ) =−N logσ−N

2log(2π)−

1

2σ2

i

(xi −µ)2 (3)

∂ `

∂ µ=0+

1

σ2

i

(xi −µ) (4)

Solve for µ:

0=1

σ2

i

(xi −µ) (5)

0=∑

i

xi −Nµ (6)

µ=

i xi

N(7)

Consistent with what we said before

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 4

Page 25: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

MLE of Gaussian σ

`(µ,σ) =−N logσ−N

2log(2π)−

1

2σ2

i

(xi −µ)2 (8)

∂ `

∂ σ=−

N

σ+0+

1

σ3

i

(xi −µ)2 (9)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 4

Page 26: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

MLE of Gaussian σ

`(µ,σ) =−N logσ−N

2log(2π)−

1

2σ2

i

(xi −µ)2 (8)

∂ `

∂ σ=−

N

σ+0+

1

σ3

i

(xi −µ)2 (9)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 4

Page 27: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

MLE of Gaussian σ

`(µ,σ) =−N logσ−N

2log(2π)−

1

2σ2

i

(xi −µ)2 (8)

∂ `

∂ σ=−

N

σ+0+

1

σ3

i

(xi −µ)2 (9)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 4

Page 28: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

MLE of Gaussian σ

`(µ,σ) =−N logσ−N

2log(2π)−

1

2σ2

i

(xi −µ)2 (8)

∂ `

∂ σ=−

N

σ+0+

1

σ3

i

(xi −µ)2 (9)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 4

Page 29: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

MLE of Gaussian σ

`(µ,σ) =−N logσ−N

2log(2π)−

1

2σ2

i

(xi −µ)2 (8)

∂ `

∂ σ=−

N

σ+0+

1

σ3

i

(xi −µ)2 (9)

Solve for σ:

0=−n

σ+

1

σ3

i

(xi −µ)2 (10)

N

σ=

1

σ3

i

(xi −µ)2 (11)

σ2 =

i(xi −µ)2

N(12)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 4

Page 30: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

MLE of Gaussian σ

`(µ,σ) =−N logσ−N

2log(2π)−

1

2σ2

i

(xi −µ)2 (8)

∂ `

∂ σ=−

N

σ+0+

1

σ3

i

(xi −µ)2 (9)

Solve for σ:

0=−n

σ+

1

σ3

i

(xi −µ)2 (10)

N

σ=

1

σ3

i

(xi −µ)2 (11)

σ2 =

i(xi −µ)2

N(12)

Consistent with what we said before

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 4

Page 31: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Maximum Likelihood Estimation

INFO-2301: Quantitative Reasoning 2Michael Paul and Jordan Boyd-GraberMARCH 7, 2017

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 1 of 8

Page 32: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Optimizing Constrained Functions

Theorem: Lagrange Multiplier Method

Given functions f (x1, . . .xn) and g(x1, . . .xn), the critical points of frestricted to the set g = 0 are solutions to equations:

∂ f

∂ xi(x1, . . .xn) =λ

∂ g

∂ xi(x1, . . .xn) ∀i

g(x1, . . .xn) = 0

This is n+1 equations in the n+1 variables x1, . . .xn,λ.

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 8

Page 33: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Lagrange Example

Maximize `(x ,y) =p

xy subject to the constraint 20x +10y = 200.

• Compute derivatives

∂ `

∂ x=

1

2

s

y

x

∂ g

∂ x= 20

∂ `

∂ y=

1

2

√x

y

∂ g

∂ y= 10

• Create new systems of equations

1

2

s

y

x= 20λ

1

2

√x

y= 10λ

20x +10y = 200

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 8

Page 34: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Lagrange Example

Maximize `(x ,y) =p

xy subject to the constraint 20x +10y = 200.

• Compute derivatives

∂ `

∂ x=

1

2

s

y

x

∂ g

∂ x= 20

∂ `

∂ y=

1

2

√x

y

∂ g

∂ y= 10

• Create new systems of equations

1

2

s

y

x= 20λ

1

2

√x

y= 10λ

20x +10y = 200

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 8

Page 35: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Lagrange Example

Maximize `(x ,y) =p

xy subject to the constraint 20x +10y = 200.

• Compute derivatives

∂ `

∂ x=

1

2

s

y

x

∂ g

∂ x= 20

∂ `

∂ y=

1

2

√x

y

∂ g

∂ y= 10

• Create new systems of equations

1

2

s

y

x= 20λ

1

2

√x

y= 10λ

20x +10y = 200

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 8

Page 36: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Lagrange Example

Maximize `(x ,y) =p

xy subject to the constraint 20x +10y = 200.

• Compute derivatives

∂ `

∂ x=

1

2

s

y

x

∂ g

∂ x= 20

∂ `

∂ y=

1

2

√x

y

∂ g

∂ y= 10

• Create new systems of equations

1

2

s

y

x= 20λ

1

2

√x

y= 10λ

20x +10y = 200

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 8

Page 37: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Lagrange Example

• Dividing the first equation by the second gives us

y

x= 2 (1)

• which means y = 2x , plugging this into the constraint equation gives:

20x +10(2x) = 200

x = 5⇒ y = 10

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 8

Page 38: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Discrete Distribution: Multinomial

• Recall the mass function (N is total number of observations, xi is thenumber for each cell, θi probability of cell)

p(~x | ~θ ) =N!∏

i xi !

θ xii (2)

• Taking the log makes math easier, doesn’t change answer (monotonic)

• If we observe x1 . . .xN , then log likelihood is

`( ~θ )≡ log(n!)−∑

i

log(xi !)+∑

i

xi logθi (3)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 5 of 8

Page 39: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Discrete Distribution: Multinomial

• Recall the mass function (N is total number of observations, xi is thenumber for each cell, θi probability of cell)

p(~x | ~θ ) =N!∏

i xi !

θ xii (2)

• Taking the log makes math easier, doesn’t change answer (monotonic)

• If we observe x1 . . .xN , then log likelihood is

`( ~θ )≡ log(n!)−∑

i

log(xi !)+∑

i

xi logθi (3)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 5 of 8

Page 40: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Discrete Distribution: Multinomial

• Recall the mass function (N is total number of observations, xi is thenumber for each cell, θi probability of cell)

p(~x | ~θ ) =N!∏

i xi !

θ xii (2)

• Taking the log makes math easier, doesn’t change answer (monotonic)

• If we observe x1 . . .xN , then log likelihood is

`( ~θ )≡ log(n!)−∑

i

log(xi !)+∑

i

xi logθi (3)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 5 of 8

Page 41: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Discrete Distribution: Multinomial

• Recall the mass function (N is total number of observations, xi is thenumber for each cell, θi probability of cell)

p(~x | ~θ ) =N!∏

i xi !

θ xii (2)

• Taking the log makes math easier, doesn’t change answer (monotonic)

• If we observe x1 . . .xN , then log likelihood is

`( ~θ )≡ log(n!)−∑

i

log(xi !)+∑

i

xi logθi (3)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 5 of 8

Page 42: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Discrete Distribution: Multinomial

• Recall the mass function (N is total number of observations, xi is thenumber for each cell, θi probability of cell)

p(~x | ~θ ) =N!∏

i xi !

θ xii (2)

• Taking the log makes math easier, doesn’t change answer (monotonic)

• If we observe x1 . . .xN , then log likelihood is

`( ~θ )≡ log(n!)−∑

i

log(xi !)+∑

i

xi logθi (3)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 5 of 8

Page 43: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

MLE of Multinomial θ

`( ~θ ) = log(N!)−∑

i

log(xi !)+∑

i

xi logθi +λ

1−∑

i

θi

(4)

(5)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 6 of 8

Page 44: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

MLE of Multinomial θ

`( ~θ ) = log(N!)−∑

i

log(xi !)+∑

i

xi logθi +λ

1−∑

i

θi

(4)

(5)

Where did this come from? Constraint that ~θ must be a distribution.

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 6 of 8

Page 45: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

MLE of Multinomial θ

`( ~θ ) = log(N!)−∑

i

log(xi !)+∑

i

xi logθi +λ

1−∑

i

θi

(4)

(5)

• ∂ `∂ θi

= xiθi−λ

• ∂ `∂ λ = 1−∑

i θi

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 6 of 8

Page 46: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

MLE of Multinomial θ

`( ~θ ) = log(N!)−∑

i

log(xi !)+∑

i

xi logθi +λ

1−∑

i

θi

(4)

(5)

• ∂ `∂ θi

= xiθi−λ

• ∂ `∂ λ = 1−∑

i θi

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 6 of 8

Page 47: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

MLE of Multinomial θ

`( ~θ ) = log(N!)−∑

i

log(xi !)+∑

i

xi logθi +λ

1−∑

i

θi

(4)

(5)

• ∂ `∂ θi

= xiθi−λ

• ∂ `∂ λ = 1−∑

i θi

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 6 of 8

Page 48: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

MLE of Multinomial θ

`( ~θ ) = log(N!)−∑

i

log(xi !)+∑

i

xi logθi +λ

1−∑

i

θi

(4)

(5)

• ∂ `∂ θi

= xiθi−λ

• ∂ `∂ λ = 1−∑

i θi

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 6 of 8

Page 49: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

MLE of Multinomial θ

• We have system of equations

θ1 =x1

λ(6)

...... (7)

θK =xK

λ(8)

i

θi =1 (9)

• So let’s substitute the first K equations into the last:∑

i

xi

λ= 1 (10)

• So λ=∑

i xi =N, and θi =xiN

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 7 of 8

Page 50: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

MLE of Multinomial θ

• We have system of equations

θ1 =x1

λ(6)

...... (7)

θK =xK

λ(8)

i

θi =1 (9)

• So let’s substitute the first K equations into the last:∑

i

xi

λ= 1 (10)

• So λ=∑

i xi =N, and θi =xiN

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 7 of 8

Page 51: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

MLE of Multinomial θ

• We have system of equations

θ1 =x1

λ(6)

...... (7)

θK =xK

λ(8)

i

θi =1 (9)

• So let’s substitute the first K equations into the last:∑

i

xi

λ= 1 (10)

• So λ=∑

i xi =N,

and θi =xiN

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 7 of 8

Page 52: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

MLE of Multinomial θ

• We have system of equations

θ1 =x1

λ(6)

...... (7)

θK =xK

λ(8)

i

θi =1 (9)

• So let’s substitute the first K equations into the last:∑

i

xi

λ= 1 (10)

• So λ=∑

i xi =N, and θi =xiN

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 7 of 8

Page 53: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Maximum Likelihood Estimation

INFO-2301: Quantitative Reasoning 2Michael Paul and Jordan Boyd-GraberMARCH 7, 2017

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 1 of 6

Page 54: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Big Pictures

• Ran through several common examples

• For existing distributions you can (and should) look up MLE

• For new models, you can’t (foreshadowing of later in class)

◦ Classification models◦ Unsupervised models (Expectation-Maximization)

• Not always so easy

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 6

Page 55: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Big Pictures

• Ran through several common examples

• For existing distributions you can (and should) look up MLE

• For new models, you can’t (foreshadowing of later in class)

◦ Classification models◦ Unsupervised models (Expectation-Maximization)

• Not always so easy

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 6

Page 56: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Classification

• Classification can be viewed asp(y |x ,θ )

• Have x ,y , need θ

• Discovering θ is also problem ofMLE

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 6

Page 57: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Clustering

• Clustering can be viewed asp(x |z,θ )

• Have x , need z,θ

• z is guessed at iteratively(Expectation)

• θ estimated to maximizelikelihood (Maximization)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 6

Page 58: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Not always so easy: Bias

• An estimator is biased if E�

θ̂�

6= θ• We won’t prove it, but the estimate for variance is biased

• Comes from estimating µ, so need to “shrink” variance

σ̂2 =1

N −1

i

(xi −µ)2 (1)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 5 of 6

Page 59: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Not always so easy: Intractable Likelihoods

• Not always possible to “solve for” optimal estimator

• Use gradient optimization (we’ll see this for logistic regression)

• Use other approximations (e.g., Monte Carlo sampling)

• Whole subfield of statistics / information science

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 6 of 6