24
By Francis Smart Michigan State University Agricultural, Food, and Resource Economics Measurement and Quantitative Methods www.econometricsbysimulation.com AN INTRODUCTION TO SIMULATION DESIGN IN THE SOCIAL SCIENCES

An Introduction to Simulation in the Social Sciences

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: An Introduction to Simulation in the Social Sciences

By Francis Smart

Michigan State UniversityAgricultural, Food, and Resource EconomicsMeasurement and Quantitative Methods

www.econometricsbysimulation.com

AN INTRODUCTION TO SIMULATION DESIGN IN THE SOCIAL SCIENCES

Page 2: An Introduction to Simulation in the Social Sciences

Why Simulate?Simulation is a detailed thought experiment:

1. Confirm theoretical results.

2. Explore the unknown theoretical environments.

3. Statistical method for generating estimates.

Page 3: An Introduction to Simulation in the Social Sciences

Why Simulate?1. Confirmatory results:

a. Develop theoryb. Design simulationc. Get resultsd. Sensitivity analysis

2. Exploratory analysis:a. Develop simulationb. Get resultsc. Develop theoryd. Sensitivity analysis

3. Statistical estimators:a. Bootstrapb. Markov Chain Monte Carlo (Bayesian)

Page 4: An Introduction to Simulation in the Social Sciences

Some examples• Confirmatory: 1. Econometrician – new estimator, demonstrate

performance2. Psychometrician – new item response function,

demonstrates performance

• Exploratory:1. Econometrician – test performance of consistent

estimator on small sample2. Epidemiologist – explore the effects of different levels

of mosquito net usage in a dynamic infection model3. Educational researcher – wonder about the best way

to estimate teacher ability when students are non-randomly assigned.

Page 5: An Introduction to Simulation in the Social Sciences

Simulation StagesAll simulations can be broken down into a series of discrete stages.

Perform summary statistics from the

collection of indicators

and the parameters

that generated

them.

Compute Results

Know what indicators you need

and develop methods for generating

those indicators.

Calculate/Store

IndicatorsMost

simulations generate a

data set for every time

they run.

Generate Data

Survey literatureCalibrate

modelDraw from

real data

Assign Parameters

Choose theoretical paradigm

Specify Model

Repeat

Page 6: An Introduction to Simulation in the Social Sciences

1. Specify Model• Identify underlying model (theoretical paradigm)This should be obvious usually based on the discipline which you are in though it is not uncommon for simulations to be interdisciplinary in nature.

• Identify minimum required complexityGenerally the simpler the model for which you can test/demonstrate your theory, the better. The more complexity in your model the more places for uncertainty in what is driving your results.

Page 7: An Introduction to Simulation in the Social Sciences

Choice of EnvironmentStata or R*

1. Most people will have a previously defined preference.

2. Simple simulations are often easier in Stata because of built in commands like “simulate”

3. Simulations handling multiple agents, multiple data sets, or complex relationships are often easier in R.

4. Stata is to Accounting like R is to Tetris.

* There are many other programming languages suitable for simulation studies. These are the two which I know well.

Page 8: An Introduction to Simulation in the Social Sciences

2. Assign Parameters•Survey the literature for reasonable model parameters.

•Estimate reasonable model parameters from available data.

•Generate a reasonable argument for parameter choices without theoretical backing.

•Allow some parameters to vary either gradually or randomly.

Page 9: An Introduction to Simulation in the Social Sciences

Model Calibration• Typically there are parameters available for which no estimates are available.

• Modify these parameters in such a ways as to calibrate the model in such a way as to lead to believable and desirable outcomes.

• For instance: In the malaria transmission simulation we varied mosquito speed and malaria resistance rates to achieve a desired infection rate among the general population of 15-30% at stead state.

Page 10: An Introduction to Simulation in the Social Sciences

3. Generate Data• Draw from theoretical distributions.

• Resample from available data. Bootstrapping (for instance)

• Sort or organize data.

Distribution Stata RNormal rnormal() rnorm()Uniform runiform() runif()Poisson rpoisson() rpois()Bernoulli rbinomial(1,…) rbinomial(…,1,

…)

Page 11: An Introduction to Simulation in the Social Sciences

Random Seed•Most programs are incapable of generating truly random numbers.

•Often, truly random numbers are undesirable.

• If randomness exists, then results cannot be duplicated.

• Setting the seed allows for exactly duplicate ‘random’ variables to be generated. Thus results do not change.

Page 12: An Introduction to Simulation in the Social Sciences

Calculate results• Know what results are needed for confirmation of your theory. For example:

1. Benefit of bednet usage is greater than the cost of bednets2. The estimator is unbiased.3. Estimates from one estimator are better than those from another.

• Know what results are needed for confirmation that simulation is working properly. For example:

1. Students should only have one teacher per grade.2. The skewedness of the explanatory variable should be less than that of the dependent variable.

Page 13: An Introduction to Simulation in the Social Sciences

Repeat• This may seem like a trivial task but it is not. Repetition is essential in most simulations. It is generally unconvincing (and often uninformative) to run a simulation only once.

• Some people do not believe results of any simulation that is not repeated at least 1000 times.

• How one repeats a simulation and how one interprets the results of the collective set of repetitions are important questions. For example:1. Does one count the number of times that a mosquito net is profitable

to buy or how much on average return from purchasing mosquito nets is?

2. Does one present the average of an estimator and its standard deviation or does one present how frequently the true parameter falls within the confidence interval of the estimator.

Page 14: An Introduction to Simulation in the Social Sciences

Necessary Programming Tools• Macros/scalar manipulation

• Data generating commands

• For/While loops

• The ability to store results after commands

Page 15: An Introduction to Simulation in the Social Sciences

Example Simulation:Stata: Simulate the result of errors correlated with explanatory variable.

set more off* Turn the scroll lock off (I have it set to permenently off on my computer)

clear* Clear the old data

set obs 1000* Tell stata you want 1000 observations available to be used for data generation.

gen x = rnormal()* This is some random explanatory variable

Page 16: An Introduction to Simulation in the Social Sciences

Sort x and usort x* Now the data is ordered from the smallest x to the largest x

gen id = _n* This will count from 1 to 1000 so that each observation has a unique id

gen u = rnormal()* u is the unobserved error in the model

sort u* Now the data is ordered from the smallest u to the largest u

gen x2 = .* We are going to match up the smallest u with the smallest x.

Page 17: An Introduction to Simulation in the Social Sciences

Force the correlation between x draws and the error to be positive.* This will loop from 1 to 1000forv i=1/1000 { replace x2 = x[`i'] if id[`i']==_n}

drop xrename x2 x

corr x u/* | x u-------------+------------------ x | 1.0000 u | 0.9980 1.0000 */

Page 18: An Introduction to Simulation in the Social Sciences

Resultsgen y = 5 + 2*x + u*5

reg y x

Source | SS df MS Number of obs = 1000-------------+------------------------------ F( 1, 998) = . Model | 50827.8493 1 50827.8493 Prob > F = 0.0000 Residual | 55.8351723 998 .055947066 R-squared = 0.9989-------------+------------------------------ Adj R-squared = 0.9989 Total | 50883.6844 999 50.9346191 Root MSE = .23653

------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- x | 7.145123 .0074963 953.15 0.000 7.130412 7.159833 _cons | 4.858391 .0074869 648.92 0.000 4.843699 4.873083------------------------------------------------------------------------------

* It is clear that we have shown that when the error is correlated in OLS that the estimator can be severely biased.

Page 19: An Introduction to Simulation in the Social Sciences

Same simulation in Rx = sort(rnorm(1000))u = sort(rnorm(1000))

y = 5 + 2*x + u*5

summary(lm(y~x))# This simulation turns out to be extremely easy in R

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.75818 0.01281 371.5 <2e-16 ***x 6.86977 0.01282 535.8 <2e-16 ***

Page 20: An Introduction to Simulation in the Social Sciences

Multi-agent simulations• Are simulations in which agents with specified command routines interact. Some result of that interaction is subsequently observed and stored for analysis.

• An example from my work is a recent project with Andrew Dillon in which we simulated an environment populated by both humans and mosquitos. The human population stayed constant while the mosquito population moved each round. Mosquitos had the chance of becoming infected with malaria or infecting humans with malaria. Two hundred days (rounds) were simulated per simulation and the last thirty were used to calculate the returns from technology choice for the group that decided to use prevention technology at the beginning of the simulation relative to those who decided against prevention technology.

Page 21: An Introduction to Simulation in the Social Sciences

Multi-agent simulations: Error CheckingEspecially prone to errors. Develop error routines to check for bugs.

1. If assigning subjects to groups make sure all of the subjects have only one group and all of the groups have equal numbers of subjects (if balanced).

2. If generating composite random variables be sure the resulting random variables have reasonable ranges (probabilities cannot be less than 0 or greater than 1).

Page 22: An Introduction to Simulation in the Social Sciences

Graphical error checks• Generate graphical figures as a means of checking for errors

The simulation appears to be converging on a stead state.

Page 23: An Introduction to Simulation in the Social Sciences

Statistical Estimators• Bootstrap (case resampling) The bootstrap routine takes advantage of the assumption of random sampling. It is often used to estimate the variances of random variables.

• Markov Chain Monte Carlo (Bayesian Estimation) MCMC are a class of algorithms that has an equilibrium distribution as a desired distribution. MCMC uses some kind of rules to move from a specified prior distribution to a distribution reflective of the sample distribution.

Page 24: An Introduction to Simulation in the Social Sciences

For Additional Reference

•For many more examples of simulations in R and Stata go to www.econometricsbysimulation.com