Management Science (G12 2001/02) Part I (Michaelmas): Uncertainty Part II (Lent): Choice

Management Science (G12 2001/02)

Part I (Michaelmas): Uncertainty Part II (Lent): Choice

Today’s plan

What is Management Science and what’s the purpose of this course?

• Spreadsheets as a modelling platform

• Introduction to Monte Carlo Simulation

• How can a computer roll a die?

Management Science• … uses computer technology to help

managers make good decisions and optimise organisational processes

• Operations Research and Decision Science are other names

• Professional bodies– International: Institute for Operations

Research and the Management Sciences (INFORMS) (www.informs.org)

– UK: Operational Research Society (www.orsoc.org.uk)

Core issues of Management Science

• Problem areas– decision making – Organisational process design and

control

• Core issues– complexity– uncertainty – choice (optimality)

• Methodology: – Modelling (graphical, computer, or

mathematical models)

Some important side issues

• Models as communication devices– Presentation of quantitative

information

• Quantitative (“hard”) versus qualitative (“soft”) modelling– The “human factor”

• Management information systems – Availability and reliability of data

Aim of the course• Make you aware of the potential (and

limitations) of the Management Science approach

• Provide you with some hands-on experience in modelling (coursework)

• Discuss theoretical underpinning of some important modelling templates (exam)

Michaelmas term: Uncertainty

• Computer models of uncertainty– Monte Carlo simulation

• Mathematical models of uncertainty– Probability theory and stochastic processes

• Modelling templates– Queuing systems– Markov chains– Others as time permits

• Analysing dependencies between uncertainties– Linear regression

• Forecasting the future

Supervisions and coursework

• Supervision 1: Simulation in spreadsheets

• Supervision 2: Stochastic processes• Supervision 3: Regression and

forecasting• Coursework: Analysis of a business

case using spreadsheet simulations (more later…)

Today’s plan

• What is Management Science and what’s the purpose of this course?

Spreadsheets as a modelling platform



Why spreadsheets? • A spreadsheet is a tool that allows you to

– store and present quantitative information– process quantitative information– perform what-if analyses– do much more…

• Spreadsheets have many disadvantages– Limited data structure (2-dimensional array) – Difficult to validate and document– Inflexible– Unreliable numerical routines

The Cons

And the Pros

Background Information• I assume that you are familiar with basic

spreadsheet programming• If not, go through a free tutorial on the web, e.g.

– http://www.compusmart.ab.ca/alummis/excel/exceltutorial.html

– http://www.usd.edu/trio/tut/excel/– http://www.jcu.edu/infoservice/training/excel/

start.htm

• More advanced material can be found in B.V. Liengme, A guide to Microsoft Excel for scientists and engineers (CUED Lib.)

A tip for spreadsheet modelling

• Clearly separate – Data (input to the model that is not under your control)– Design parameters (input that is under your control)– The actual model (logical description of the model)– Model output ( basis for decision, often includes

graphical elements)

• For larger models use separate worksheets

• Ideally, no cell in the logical model section contains a number– These cells only contain formulas and references to

other cells, e.g. in the data or parameter section

The five stages of computer modelling (Donald Knuth)

1. Decide what you want the model to do

2. Decide how to build the model3. Build the model4. Debug the model5. Trash stages 1 to 4 and start again,

now that you know what you really wanted in the first place

Don’t get frustrated: A modelling process is a learning process

The main benefit of building a (computer) model to analyse a problem is not the quantitative

information obtained as output of the model but the enhanced

understanding of the problem gained during the modelling

process

Today’s plan



Introduction to Monte Carlo Simulation


Example: A product launch• Main criterion: profitability

Profit = sales*(unit price- unit costs)-fixed costs

• Suppose fixed costs known • Price is a decision variable and influences

sales– Pricing decision depends, among other things,

on the level of competition and the reaction of the competitors to the launch (uncertain)

• Unit cost is uncertain and depends on prices for raw materials, energy, etc.

• Let’s look at a spreadsheet model…

The Flaw of the Average

• Plugging average values into uncertain cells can lead you astray

• The resulting bottom line (e.g. profit) is often not the average profit

• Mathematical Reason: E(f(X))=f(E(X)) for a random variable X holds ONLY if f is linear

What do we want to achieve with a simulation

model?• A single number, e.g. an average, gives very

little information if model input is uncertain• Manual what-if analysis is cumbersome and

biased• We want to estimate the distribution of

output cells – Give a graphical representation of this

distribution• cumulative distribution function• histogram

Example: Value at risk

• Given a distribution function for profit, we can read off the loss x such that there is an α% chance that the loss is at least £ x

• That number is called the α% value at risk (α% VAR)

10% VAR is roughly £500,0005% VAR is roughly £800,000

Profit Distribution

0.0%10.0%20.0%30.0%40.0%50.0%60.0%70.0%80.0%90.0%100.0%

-£1,500,000 -£1,000,000 -£500,000 £0 £500,000 £1,000,000 £1,500,000 £2,000,000 £2,500,000

Main steps of a simulation project

• Understand the problem• Programme and validate a deterministic

model• Determine the distribution of the

uncertain inputs– Collect relevant data

• Run the simulation experiments• Analyse the model output• Communicate the model and its output

Building the model…

• Essential questions before you model:– What are the questions that the model is to address?– What are the interesting outputs / performance

measures? – What is the appropriate level of detail?

• Discuss these questions with all stakeholders in the decision situation / process

• Validate the logic of your model before you enter uncertainty

Incorporating uncertainty • Estimate a probability distribution on the

basis of “hard” data whenever possible– Re-sampling from historic data (see product launch

example) is a simple and valid way of generating numbers for uncertain cells

• Sometimes you need subjective probabilities– Get estimates from many independent experts (Delphi

method)– Check sensitivity of outputs w.r.t. changing probabilities

• A triangular distribution is often a good starting point – Defined by lowest, highest and most likely value

Running a simulation • Important Rule: do more replications

than you expect necessary – at least several hundred

• Check that running averages of outputs have settled in a steady state– E.g. record after each replication the

average profit over all past replications • Do several runs and compare the

results

Analysing the output

• Use visual aids (histograms, distribution functions, scatter diagrams, etc.)

• Use statistics (confidence intervals, hypothesis tests)

• What are the implications of the results– for your model world?– for the real world?

Communicating the model and the results

• IMPORTANT: Communicate regularly with all stakeholders in the decision situation or process you are modelling– Build credibility for your model– Model becomes a “language”– Forces you to be as simple as possible– Forces you to be as relevant as possible

A word about simulation platforms

• Spreadsheets are useful for the simulation of many day-to-day decision problems

• They are NOT suitable for the simulation of complex processes, not least because they are difficult to validate

• Professional platforms are available that facilitate the programming and validation of complex models, e.g. through graphical interfaces

• Back to spreadsheets…

Preparing a spreadsheet model for simulation

• Write a model as if all inputs (data) were certain

• Mark clearly all uncertain input cells (colour them)

• Feed the input cells with appropriate randomly chosen numbers– The F9 key (recalculation) now produces one scenario

after the other

• Set up a worksheet for replications of your model, using the data table command – More specifics can be found in Ragsdale, chapter 12.4-

12.8

Feeding uncertain cells• Suppose cell x is known to be uniformly

distributed on the interval [0,1]– Put “=rand()” into cell x– Pressing F9 is equivalent to sampling from a

uniform distribution and putting the number into cell x

• randbetween(a,b) samples integers between a and b, including the integers a and b, with equal probability 1/(b-a+1)

• Analysis Tool Pack needs to be loaded for this to work (Tools –> add-ins)

More general distributions• Will see later how to generate more

general distributions, using the rand() function (inverse transform method)

• Example: norminv(rand(),a,b) samples from a normal distribution with mean a and standard deviation b

• Alternative random variable generators are provided with the add-ins in the books by Ragsdale and Savage

Today’s plan




How can a computer roll a die?

Issues to be addressed

• How can a computer roll a die? • How can we use past data? • What if random cells are

statistically dependent (e.g. annual demands over the next five years, stock prices of BMW and Daimler-Chrysler)

Random Number Generation:

Can a Computer Roll a Die?• Computers can only perform arithmetic

operations which by their very nature give deterministic and not random results.

• There is no such thing as a true random number generator on a digital computer. Random numbers generated by a computer are therefore sometimes called PSEUDO-RANDOM NUMBERS.

John von Neumann (1951)

Any one who considers arithmetical methods of producing random digits is, of course, in a state of sin. For...there is no such thing as a random number - there are only methods to produce random numbers, and a strict arithmetic procedure of course is not such a method.... We are dealing here with mere ‘‘cooking recipes’’ for making digits...

BUT…

John von Neumann goes on by saying that these recipes

...probably...can not be justified, but should merely be judged by their results. Some statistical study of the digits generated by a given recipe should be made, but exhaustive tests are impractical. If the digits work well on one problem, they seem usually to be successful with others of the same type.

What can we hope for?

• An arithmetic method (‘recipe’) that generates a sequence of numbers which appear as if they were randomly chosen in the sense that they pass certain statistical tests

• Best understood and widely used are linear congruential RNGs possibly enhanced by a “shuffling technique”

Linear Congruential Methods

<<0 seed and <0increment

,<<0mutiplier ,0 modulus

,,, :method for the parametersinteger four Need -

division thisofremainder

thebe let and by divide :In words

) (mod :formulaRecursion -

(Seed) number initial Choose -

0

0

1

1

0

mrmc

mam

rcam

rmcar

mcarr

r

ii

ii

Random number streams

• A linear congruential method produces a sequence of numbers r0 ,r1 ,r2 ,...

• All numbers ri lie between 0 and m-1.• The conversion formula ui=(ri+0.5)/m

gives a sequence of numbers u1 , u2,... which lie between 0 and 1.

• We call the finite sequence u0 ,u1 ,u2 ,..., un a random number stream generated by a linear congruential RNG

Good Versus Bad Random Number Streams

• A random number stream u1,u2,u3,...,un should resemble a sequence of n independent samples from a uniform distribution on the interval [0,1].

• Whether a linear congruential RNG has this property depends a lot on the choice of the parameters a,c,m.

Example• What is the cycle length of the linear

congruential generator with modulus 15, multiplier 4 and increment 0?

• Try out various seeds until all numbers between 1 and 14 have shown up:

•1,4,1…•2,8,2…•3,12,3…•5,5…•etc.

• Why is this a bad RNG?

Cycle Length

• Want the sequence of random numbers to successively fill the whole interval [0,1] without leaving large gaps.

• Since all numbers ri lie between 0 and m-1, the sequence r0 ,r1 ,r2 ,.... will repeat after at most m -1 iterations – It may, however, start repeating much earlier

• The cycle length (or period) of a linear congruential RNG is the minimal length n of a sequence r0 ,r1 ,...,rn with rn = r0

• The RNG is said to have maximal cycle length if its cycle length is m-1

Built-in RNGs

• Be suspicious of built-in RNGs on your computer– You can assume that it has maximal cycle length but

that does not guarantee good statistical properties

• If possible, find out which generator is used and whether it has been tested in the literature

• If this information is not available (quite likely), you should at least perform some statistical tests before you use it for simulations

Some Popular RNGs

• Most RNGs are purely multiplicative(c=0)– m=231-1, a=75, c=0 (Learmouth and Lewis 1973)– m=231-1, a=630,360,016, c=0 (Payne et al. 1969)

• A comparison of various multipliers for the modulus m=231-1 has been done in a series of papers by Fishman and Moore (1981,1982,1986). – They found that the statistical performance of the

Payne et al. RNG is better than that of the Learmouth Lewis RNG

Statistical Testing of RNGs

Generate a random number stream u1,...,un and use statistical tests to see how closely the stream resembles a sample of size n drawn from a uniform distribution on the interval [0,1].

Setting up the test • We want to check how good a sequence

produced by an RNG ‘fits’ the uniform distribution

• Divide the interval [0,1] into k subintervals of equal size. (Typically 100<k<n/5)

• Determine the number fi of values in the random number stream that fall in the i-th subinterval

What do we expect? • If the ui’s are drawn from a uniform distribution

over [0,1] then we expect that fi is approximately n/k

• Chi-square Statistic

• Mathematical result: If the frequencies fi are obtained from a uniform distribution (and n>5k) then the distribution of the random variable is close to a chi-square distribution with k-1 degrees of freedom

2

1

2 k

nf

n

kii

k

( )

The goodness of fit test

• Suppose RNG is uniform (hypothesis) and let x be the observed value of the test statistic

• The hypothesis is (statistically) inconsistent with the observation if it is unlikely that the test statistic assumes a value as large as x – Reject hypothesis if P( x) is small, e.g. below

5%

• P( x) is called the p-value of the test– Reject if p-value is below the significance level (5%)

Higher dimensional goodness of fit tests

• d-dimensional vectors (u1 ,...,ud ), (ud+1 ,...,u2d ), ... should be uniformly distributed in the d-dimensional cube [0,1]d

• A division of [0,1] into k subintervals of equal size gives a division of the d-cube [0,1]d into dk subcubes of equal volume

• Generate n d-vectors U1 ,...,Un (each requiring the generation of d random numbers) and let fi1...id be the number of vectors having their j th component in the ij th subinterval

• Chi-squared test can be appropriately modified

Shuffling• One can enhance the performance of an

RNG by using another RNG to shuffle a set of random numbers generated by the first one.

• Start with an array U=(u1 ,...,un ) of n random numbers generated by RNG 1. Then repeat the following procedure: – Use RNG 2 to generate a random integer

M between 1 and n.– Select uM as your next random number

and replace uM in the array U by a new random number generated by RNG 1

Further Tests• Want consecutive draws to be independent• Test of independence can be done

– by estimating lag correlation– by examining the random number stream

for unbroken sub-sequences of maximal length within which the random numbers increase monotonically (run-ups or run-downs).

Standard reference: D.E. Knuth “The Art of Computer Programming”, Volume 2.

Serial correlation• Main disadvantage of congruential RNGs is that

they are not free of serial correlation– Example: Steel manufacturer simulates sales

• Main uncertainty drivers aare time between orders and order quantity

• Uncertain quantities are generated in pairs (time to next order, order quantity) during the simulation

• There should be no correlation between the two variables

• More sophisticated RNGs avoid this but they are computationally more expensive– There is generally a trade-off between statistical

performance and computing time

Where are we now?• You know that RNGs do not really generate

random numbers but rather pseudo-random numbers

• A lot of software has a built-in uniform RNG – You have seen a popular “recipe” for

uniform RNGs – You have seen how uniform RNGs can be

tested statistically

• How can we generate other random variables with a uniform RNG?

The Inverse Transform Method

(discrete RVs)• Given a stream of uniform random

numbers ui on [0,1], generate a stream of realizations xi of a discrete RV X with probability mass function pi=P(X=i) for i=0,1,2,...

• If U is a uniform RV then P(p1+...+ps-1 U p1+...+ps-1+ps) = ps

• Hence, given ui we set xi=i if

p1+...+ps-1 ui p1+...+ps-1+ps

The Inverse Transform Method(continuous RVs)

• Task: Given a stream of uniform random numbers ui,…,un on [0,1], generate a sample x1,…, xn from a continuous distribution with

cumulative distribution function F • Assumption for the method: F increases

strictly on the range of values x with 0<F(x)<1. – Then F(x)=z has a unique solution x=F-1(z) for

every z with 0<z<1– and F-1(z) is strictly increasing as well

The Inverse Transform Method(continuous RVs, cont.)

• If U is uniform RV on [0,1] then F(x) = P(U F(x)) = P(F-1 (U) F-1(F(x))) = P(F-1 (U) x)• Hence the RV F-1 (U) has the same distribution

as X• Calculate xi from ui by setting xi = F-1 (ui ), i.e.

by solving F(xi )= ui – Example Norminv(rand(),a,b)– Applies to exponential RV with mean m (F(x)=1-e-x/m)

What if you don’t know the distribution?

• Many distributions (normal, exponential, etc.) are defined by a small set of parameters (mean, variance, etc.) – If you know the parameters you know the

distribution

• Sometimes you don’t know the specific type of distribution but are instead faced with a set of historical company data for the uncertain cell – e.g. order quantities and times between

orders for a particular product

Resampling• In this case, you can create a historical

distribution function and sample from this• There an easy way of doing this

– Suppose you have a list of n data values (in a spreadsheet)

– Choose a number x between 1 and n at random (using randbetween(.,.) function)

– Select the x-th data in your list and plug it into the cell (using the vlookup or hlookup functions)

Statistical dependence• If uncertain cells are statistically dependent (e.g.

temperature and ice cream sales) then sampling independently from a list of temperatures for the temp-cell and from a list of ice cream sales for the sales cell gives a wrong result – try this with the product launch model with price and sales

• Instead, set up a list of associated (temperature,sales) pairs and then resample from the list of pairs and assign the temperature and sales from the same pair to the respective cells

What you have learned?• Management Sciene is about using computers to help

managers make good decisions and optimise their operational processes

• Spreadsheets – are used and accepted by managers everywhere in the world– are good fun to play with– have considerable drawbacks and should not be used for

complex models

• Monte Carlo Simulation is a good way of analysing uncertainty

• You know how a computer samples from a distribution – You can perform a simple test for a built-in RNG– You can use uniform RNGs to generate samples from more

general distributions

Homework • Test the Excel rand() function using a 1-

dimensional goodness of fit test with k=100 and n=500 – use chidist function (function wizard fx -> statistical)

to compute the tail probability of the chi-square distribution

• Consider the linear congruential RNG with m=64,a=21, c=1. – Use Excel to show that it has maximal cycle length.– Plot the overlapping pairs (u1,u2),(u2,u3),(u3,u4),.... Is

the emerging pattern a pattern that you would expect to be generated by a good RNG? Could this RNG cause problems in a simulation?

Homework• Generate a binomial distribution with

parameters 10 (# trials) and 0.2 (winning probability) in a spreadsheet and generate a histogram and a cumulative distribution function by running a simulation

• Set up the Excel spreadsheet for the product launch and look at the differences in the simulation results if you sample prices and sales independently rather than

dependently

Documents

Management Science (G12 2001/02) Part I (Michaelmas): Uncertainty Part II (Lent): Choice