22
STAT 534: Statistical Computing Hari Narayanan [email protected]

STAT 534: Statistical Computing

  • Upload
    aspen

  • View
    40

  • Download
    3

Embed Size (px)

DESCRIPTION

STAT 534: Statistical Computing. Hari Narayanan [email protected]. Course objectives. Write programs in R and C tailored to specifics of statistics problems you want to solve Familiarize yourself with: optimization techniques Markov Chain Monte Carlo ( mcmc ). Logistics. Class: - PowerPoint PPT Presentation

Citation preview

Page 1: STAT 534: Statistical Computing

STAT 534: Statistical Computing

Hari [email protected]

Page 2: STAT 534: Statistical Computing

Course objectives

• Write programs in R and C tailored to specifics of statistics problems you want to solve

• Familiarize yourself with:– optimization techniques– Markov Chain Monte Carlo (mcmc)

Page 3: STAT 534: Statistical Computing

Logistics

• Class: – Tuesdays and Thursdays 12:00pm – 1:20pm

• Office hours:– Thursday 2:30pm – 4pm (Padelford B-301) or by appt

• Textbooks:– Robert & Casella : Introducing Monte Carlo Methods with R– Kernighan & Ritchie : C Programming Language

• Evaluation:– 4 assignments– 2 quizzes– Final project

Page 4: STAT 534: Statistical Computing

Note to students

• These slides contain more material than • Was covered in class.• The main source http://pages.uoregon.edu/dlevin/MARKOV/markovmixing.pdf

is referred to at• http://faculty.washington.edu/harin/stat534-lectures.html

Page 5: STAT 534: Statistical Computing

Review of MCMC done so far

• Natural questions1. What is the Monte carlo method?an elementary example -(a) finding the area of an irregular region on the plane. (b) integrating a nonnegative function over an area

Page 6: STAT 534: Statistical Computing

(a) Area of an irregular region

• Bound the region by a rectangle• Pick points uniformly with rectangle• Area of region/Area of rectangle= (Number in)/TotalExample:Area of a circle of radius 1 /(area of square of side 2)= π

Page 7: STAT 534: Statistical Computing

Approximating Pi

• Draw a circle with radius =1Inside a 2x2 square – [-1,1] x [-1,1]• Pick points (x,y) taking • x<- rnorm(n, min=-1,max=1); y<- rnorm(n,

min=-1,max=1)• Let int be the number of points which fell inside the circle. Approx_pi = int/4

Page 8: STAT 534: Statistical Computing

Integrating a nonnegative function over an area

• Pick points according to a suitable distribution

How to pick points according to a given distribution?

Page 9: STAT 534: Statistical Computing

Picking points randomly

• for area of irregular region pick points uniformly at random within a larger regular regionand count how many fall within the region in question.

• for finding the integral of a nonnegative function over an irregular region, pick points according to a suitable distribution

How to pick points according to a given distribution?

Page 10: STAT 534: Statistical Computing

The markov chain method

Pick a Markov chain whose steady state distribution is the desired distribution.

Start from any point and let the markov chain run for sometime.

Pick the point you reach. You would have picked the point you reached

with approximately the steady state probability for that point

Page 11: STAT 534: Statistical Computing

Markov Chain + filter=New MC

• Here you begin with a certain Markov chain with a symmetric transition matrix• But you make your move to the next point with an additional condition of acceptance or rejection

Page 12: STAT 534: Statistical Computing

Details of the Metropolis Filter

• Let T be a symmetric transition matrix.(T is reversible with respect to the uniform Probability distribution on the sample space Ω.)

We will modify the transitions made according to T to obtain a chain with stationary distribution π

Ωπ

Page 13: STAT 534: Statistical Computing

Details of Metropolis Filter

• Evolution of the new chain :• When at state x, candidate move generated from T(x,.). • If proposed new state is y. then the move is censored with

probability (1-a(x,y)) , equivalently with probability a(x,y) the new state is the proposed state and with probability

(1-a(x,y)), the chain remains at x.

Note : This is a new Markov chain with transition matrix, say PΩπ

Page 14: STAT 534: Statistical Computing

What is the new transition matrix?

• P(x,y) = T(x,y) a(x,y) if y not equal to x = (1 -∑z T(x,z)a(x,z)), z not equal to x if y=x

We know transition matrix P has stationary distribution π if

π(x)T(x,y)a(x,y)= π(y)T(y,x)a(y,x) for all x not equal to y.

Page 15: STAT 534: Statistical Computing

Transition matrix P has stationary distribution π if π(x)T(x,y)a(x,y)= π(y)T(y,x)a(y,x) for all x not equal to y.

• Proof: For p to be stationary distribution for transition matrix P(x,y), we need πP= π, i.e ∑ over x( π (x)P(x,y)) = π (y). Here take P(x,y)=T(x,y)a(x,y). But π(x)P(x,y)= π(x)T(x,y)a(x,y)= π(y)T(y,x)a(y,x) = π (y)P(y,x)So ∑ over x( π (x)P(x,y)) = ∑ over x( π (y)P(y,x))= π (y) (∑ over x P(y,x))= π (y) since (∑ over x P(y,x))= 1.

Page 16: STAT 534: Statistical Computing

Back to the filter

• The acceptance probability of a move is a(x,y)• Rejection of moves slows the chain.• So make a(x,y) as large as possible.• How large can it be?• We will force the `nice’ conditionπ(x)T(x,y)a(x,y)= π(y)T(y,x)a(y,x),i.e., π(x)a(x,y)= π(y)a(y,x).Since a(x,y) is a probability π(x)T(x,y)a(x,y)= π(y)T(y,x)a(y,x)

<= π(x) and π(y)• So a(x,y)<= min(π(y)/ π(x), 1), since T is symmetric and a(x,y)

is a probability

Page 17: STAT 534: Statistical Computing

Metropolis chain of a Markov chain

• T(x,y) Transition matrix of Markov Chain • P(x,y) Transition matrix of Metropolis Chain• Is therefore defined as follows• P(x,y)= T(x,y) (min( π(y)/ π(x), 1), if y not = x

= 1- ∑ over z not=x (P(x,z)) if y = x For this P(x,y) , π is a stationary distribution

Page 18: STAT 534: Statistical Computing

Why all this complication?

• Note that the Metropolis chain is `just anotherMarkov Chain’. So why not start with it right away instead of

starting with some other Markov Chain and then do this acceptance/rejection?

Page 19: STAT 534: Statistical Computing

Reasons for the indirect procedure

1. May be difficult to build a Markov Chain directly for a stationary distribution π

2. The Metropolis chain depends only on the ratio π(y)/ π(x). Suppose we know π only within some normalizing constant M which is difficult to compute. Metropolis chain does not care what M is.

Page 20: STAT 534: Statistical Computing

Numbers assigned to vertices of a graph. Find max value vertex

• If you `greedily’ move from any point to a higher neighbour you can get trapped at a local maximum Consider a regular graphRandom walk has a symmetric transition matrixFor k>=1, define πk (x)= k^(f(x))/Z (k)where f(x) is the number assigned to the vertex and

Z(k)= ∑ k^(f(x)) over all x, making πk (x), aprobability measure.

Page 21: STAT 534: Statistical Computing

Suppose number of vertices Large

• Markov chain corresponding to π, difficult to compute because Z difficult to compute.

• But Metropolis chain of the `random walkMarkov Chain’ accepts transition with probability k^(f(x)-f(y)).

We will show that for large k this randomized algorithm ultimately gets to a maximum vertex.

Page 22: STAT 534: Statistical Computing

Proof that Metropolis method hits max value vertex

Since πk (x)= k^(f(x))/Z (k), this distributionfavours vertices with large value of f(x). Let f* be the max value

that a vertex has.Define Ω* = collection of all vertices with max value.

• Now πk (x) can be rewritten as ( k^(f(x))/k^(f*)) / (Z (k)/ k^(f*))• (Z (k)/ k^(f*)) can be written as |Ω*|(k^f*/k^f*) + summation

over the remaining elements of the expression k^f(x)/k^f*• For large k, therefore, πk (x) converges to 1 over the vertices in

Ω*,• i.e with probability 1 we hit one of the vertices with max value.