i nfoClearinghouse. - UNSomorr/radovan_omorjan_003_prII/s_examples/Scilab… · i nfoClearinghouse.com ©2001 Gilberto E. Urroz ... Generating data that follows the Weibull distribution

Probability Distributions with SCILAB

By

Gilberto E. Urroz, Ph.D., P.E.

Distributed by

i nfoClearinghouse.com

©2001 Gilberto E. UrrozAll Rights Reserved

A "zip" file containing all of the programs in this document (and other SCILAB documents at InfoClearinghouse.com) can be downloaded at the following site: http://www.engineering.usu.edu/cee/faculty/gurro/Software_Calculators/Scilab_Docs/ScilabBookFunctions.zip The author's SCILAB web page can be accessed at: http://www.engineering.usu.edu/cee/faculty/gurro/Scilab.html Please report any errors in this document to: [email protected]

http://www.engineering.usu.edu/cee/faculty/gurro/Software_Calculators/Scilab_Docs/ScilabBookFunctions.zip

http://www.engineering.usu.edu/cee/faculty/gurro/Software_Calculators/Scilab_Docs/ScilabBookFunctions.zip

http://www.engineering.usu.edu/cee/faculty/gurro/Scilab.html

mailto:[email protected]

Download at InfoClearinghouse.com 1 © 2001 Gilberto E. Urroz

PROBABILITY DISTRIBUTIONS 3

Discrete probability distributions 3Bernoulli probability distribution 3Binomial probability distribution 4Poisson probability distribution: 5Geometric probability distribution: 6Hypergeometric probability mass function 7

Cumulative distribution functions for discrete probability distributions 9SCILAB functions for discrete cumulative distribution functions 9

SCILAB function cdfbin 9Discrete probability calculations through user-defined functions 10

Combinations 11Binomial distribution 11Poisson distribution 12Geometric distribution 13Hypergeometric distribution 14

Continuous probability functions 15Factorials and the Gamma function 15The gamma distribution 16The exponential distribution 17The beta distribution 17The Weibull distribution 19The uniform distribution 19User-defined functions for continuous probability distributions 20

Continuous probability distributions used in statistical inference 25The Normal distribution 25The Student-t distribution 25The Chi-squared (χ2) distribution 27The F distribution 28

Applications of the normal distribution in data analysis 30Plotting a histogram and its corresponding normal curve 31Plotting data against their normal scores 34The lognormal distribution 36

Generating synthetic data 38Generating normally-distributed synthetic data 38Additional applications of function rand 39SCILAB function for generating synthetic data 40

Examples of synthetic data generation using function grand 41Additional notes on function grand 49

Pseudo-random generators 50Generating log-normally-distributed data 51Generating data that follows the Weibull distribution 52Generating data that follows the Student’s t distribution 53Generating data that follows a discrete distribution 54


Statistical simulation 56Simulating traffic through a service station 57An user-defined function to simulate traffic through a service station 58Modeling traffic through a service station with random input 60

STIXBOX: a rudimentary statistics toolbox 63

Exercises 72


Probability DistributionsThere are a number of mathematical functions that possess the properties of a probability massfunction for discrete random variables or the properties of a probability density function forcontinuous random variables. In this section we introduce a number of those functions for thecalculation of probabilities. Because these probability distributions depend on a finite numberof parameters they are typically referred to as parametric distributions.

Discrete probability distributions

Some of the most useful discrete probability distributions are the Bernoulli, Binomial, Poisson,geometric, and hypergeometric distributions. The definitions of the corresponding probabilitymass and distribution functions are shown below. We also present expressions for the mean,variance, and standard deviation of these distributions.

Bernoulli probability distribution

The Bernoulli probability distribution applies to a discrete random variable that can only havevalues of 0 or 1, i.e., X = 0, 1. Let the probability of X = 1 be p, i.e., fX(1) = p, then fX(0) = 1-p.This can be summarized as

fX(x) = px(1-p)1-x, x = 0,1

The mean value of the distribution is

µX = 0 (1-p) + 1 p = p.

The expectation of X2, E(X2), is needed to calculate the variance Var(X) = E(X2)-µX2. For the

Bernoulli distribution,E(X2) = 02 (1-p) + 12 p = p,

andVar(X) = E(X2)-µX

2 = p-p2 = p(1-p).

Thus, the standard deviation isσX = [p(1-p)]1/2.

These results can be obtained using SCILAB as follows:

-->p=poly(0,'p') p =

p

-->X = [0,1] X =

! 0. 1. !

-->Prob = [1-p p] Prob =


! 1 - p p !

-->muX = X*Prob' muX = p

-->EX2 = X^2*Prob' EX2 = p

-->VarX = EX2 - muX^2 VarX = 2 p - p

The Bernoulli distribution applies to a simple binary experiments in which only two possibleoutcomes exist: 1 or 0, yes or no, success or failure. The value of the probability of success,p, can be obtained, for example, from the classical or from the frequency definitions ofprobability. Bernoulli processes constitute the base of the binomial and geometricdistributions presented below.

Binomial probability distribution

If a Bernoulli experiment with success probability p is repeated n times, the probability ofhaving x successes out of the n trials is given by

10,,...,2,1,0,)1()1()1(

)1()1()( <<=−⋅⋅+−Γ⋅+Γ

+Γ=−⋅⋅

= −− pnxpp

rnrnpp

xn

xf xnxxnxX

with

µX = np, Var(X) = np(1-p), and σx = [np(1-p)]1/2.

In SCILAB, we can define the probability mass function for the Binomial distribution as

-->deff('[f]=fX(x,n,p)',…-->'f=gamma(n+1).*p.^x.*(1-p).^(n-x)./(gamma(x+1).*gamma(n-x+1))')

Next, we use this function to produce a plot of the probability mass function for n = 10, p =0.10:-->n=10; p=0.10; xx=[0:1:10]; yy = fX(xx);-->xset('window',1);xset('mark',-9,2); plot2d(xx',yy',-9)-->xtitle('Binomial pmf','x','fX(x)')


The following commands produce a plot of the cumulative distribution function:

-->yyy = [];for j = 1:n+1, yyy = [yyy sum(yy(1:j))]; end;

-->xset('window',2); xset('mark',-9,2); plot2d(xx',yyy',-9)

-->xtitle('Binomial cdf','x','FX(x)')

Poisson probability distribution:

If X is a Binomial variable with n →∞ and p →0, we calculate the parameter λ = n⋅p, and definethe Poisson probability mass function as

.0;,...,2,1,0,!

)( >∞=⋅=−

λλλ

xx

exfx

X

The Poisson pmf can be used to model the number of occurrences of a certain event in a giventime period or per unit length, area or volume, if λ represents the mean occurrence of theeven per unit time, length, area or volume, respectively.

The Poisson distribution has the parameters

µX = λ, Var(X) = λ2, and σx = λ.


In SCILAB we can define the Poisson distribution pmf as:

-->deff('[p]=fX(x,lambda)','p=exp(-lambda).*lambda.^x./gamma(x+1)')

A plot of the pmf for λ = 2.5 for values of x between 0 and 20:

-->lambda = 2.5; xx = [0:1:20]; yy =fX(xx,lambda);

-->xset('window',1);xset('mark',-9,2);plot2d(xx',yy',-9)

-->xset('Poisson pmf','x','fX(x)')

A plot of the corresponding cumulative distribution function follows:

-->yyy = []; for j = 1:21, yyy = [yyy sum(yy(1:j))]; end;


-->xset('window',2); xset('mark',-9,2); plot2d(xx',yyy',9)

-->xtitle('Poisson cdf','x','FX(x)')

Geometric probability distribution:

Suppose that we have a Bernoulli experiment with probability of success p being repeated untila successful outcome occurs. Let X represent the number of repetitions before a success, thenX can be modeled with the geometric pmf:

fX(x) = p⋅(1-p)x-1, x = 1, 2, …,∞; 0<p<1.

The Poisson distribution has the parameters

µX = 1/p, Var(X) = (1-p)/p2, and σx = (1-p)1/2/p.

The pmf for the geometric distribution and a plot of it is obtained in SCILAB by using:

-->deff('[f]=fX(p,x)','f=p*(1-p)^(x-1)')-->p = 0.25; xx = [0:1:20]; yy = fX(p,xx);-->xset('window',1);xset('mark',-9,2);plot2d(xx',yy',-9)-->xset('window',1);xset('mark',-9,2);plot2d(xx',yy',-9)-->xtitle('geometric pmf','x','fX(x)')


A plot of the geometric distribution CDF is shown next:

-->yyy = [];for j = 1:21, yyy = [yyy sum(yy(1:j))]; end;


-->xtitle('geometric cdf','x','FX(x)')

Hypergeometric probability mass function

Suppose that we have a finite population of N elements, out of which a < N elements aredefective. Suppose also that we take a sample of size n < N out of the population, and let Xrepresent the number of defective elements in the sample of size n. The probability of X isgiven by the following pmf:

.,...,1,0,0,0,),,,( nxNaNn

nN

xnaN

xa

Nanxf X =<<<<

−−

=


Parameters of the distribution are:

µX = n⋅a/N, Var(X) = na(N-a)(N-n)/(N2(N-1)).

To produce plots of the hypergeometric probability mass function and cumulative distributionfunction, we first define a function accounting for the binomial coefficient:

-->deff('[CC]=C(n,r)','CC=gamma(n+1)./(gamma(r+1).*gamma(n-r+1))')

This function is incorporated in the definition of the hypergeometric function:

-->deff('[p]=fX(x)','p=C(a,x).*C(N-a,n-x)./C(N,n)')

Next, we produce plots of the hypergeometric pmf and CDF for N = 100, a = 25, and n = 20:

-->N=100;a=25;n=20;-->xx=[0:1:20];yy=fX(xx);-->xset('window',1);xset('mark',-9,2);-->plot2d(xx',yy',-9);xtitle('Hypergeometric distribution','x','fX(x)');

-->yyy=[];for j=1:21, yyy=[yyy sum(yy(1:j))]; end;-->xset('window',2);xset('mark',-9,2);-->plot2d(xx',yyy',-9);xtitle('Hypergeometric distribution','x','FX(x)');-->plot2d(xx',yyy',9)


Cumulative distribution functions for discreteprobability distributions

Out of the five probability distributions presented above, namely, Bernoulli, Binomial, Poisson,geometric, and hypergeometric, three of them represent finite populations of discrete values(Bernoulli, Binomial, hypergeometric) and two representing infinite populations (Poisson andgeometric). For the Binomial, Poisson, geometric, and hypergeometric functions, thecumulative distribution function is calculated using

,)()(0

∑=

=x

kXX kfxF

where fX(x) represents the corresponding probability mass functions. (This is the definitionused to produce the CDF graphics shown in the previous examples). The cumulativedistribution function FX(x) is defined in the same range of values of the discrete randomvariable X.

For the geometric distribution, whose domain starts at x = 1, the corresponding expression is

,...3,2,1,)1()()(1

1

1=−== ∑∑

=

−

=

xppkfxFx

k

kx

kXX

SCILAB functions for discrete cumulative distribution functions

SCILAB provides a number of functions for operations with cumulative distribution functions.For discrete distributions the following functions are provided:

• cdfbin - Binomial distribution• cdfnbn - Negative binomial distribution• cdfpoi - Poisson distribution (described in detail in Chapter …)

Information on these functions can be obtained by using the help function. Next, we describethe use of function cdfbin.

SCILAB function cdfbin

There four different forms of the call to function cdfbin:

[P,Q]=cdfbin("PQ",S,Xn,Pr,Ompr)[S]=cdfbin("S",Xn,Pr,Ompr,P,Q)[Xn]=cdfbin("Xn",Pr,Ompr,P,Q,S)[Pr,Ompr]=cdfbin("PrOmpr",P,Q,S,Xn)

The variable Pr in these calls represents the probability of success on any given trial that werefer to as p in the definition of the Bernoulli pmf shown earlier. On the other hand, OmPrrepresents 1-Pr (in some references this is referred to as q = 1 - p), i.e., the probability offailure in a given trial. The variable P represents the probability P(X≤S), where X ~Binomial(Xn,Pr), while Q = 1 - P.


The first argument in the calls to function cdfbin is a string that determines which variable isbeing sought, according to:

“PQ” -calculate probabilities, P = P(X≤S) and Q = 1 - P“S” -calculate the inverse CDF, i.e., calculate S from P = P(X≤S)“Xn” -calculate the number of trials (n in the definition of the pdf)“PrOmpr” - calculate the probability of success in any given trial (p in the pdf definition)

Care should be exercised in keeping the proper order of the variables in the calls to thefunction.

Some examples follow:

-->n = 10; x = 6; p = 0.35; q = 1-p;

-->[P,Q] = cdfbin('PQ',x,n,p,q) //Calculating probabilities Q =

.0260243 P =

.9739757

-->n=20;p=0.35;q=1-p;P=0.75;Q=1-P;

-->x = cdfbin("S",n,p,q,P,Q) //Calculating the inverse CDF x =

7.9132062

-->[p,q] = cdfbin("PrOmpr",P,Q,x,n) //Calculating p and q = 1-p q =

.7391494 p =

.2608506

Notes: Use help cdfnbn to learn more about the function that implements the negativeBinomial distribution. The function cdfpoi was described in detail in Chapter 13.

Discrete probability calculations through user-defined functions

Besides the few pre-programmed cumulative distribution functions provided by SCILAB,probabilities can be calculated by defining probability mass and cumulative distributionfunctions for the different distributions presented earlier. The basic definitions ofprobabilities in terms of probability mass and cumulative distribution functions are:

P(X=x) = fX(x), pmf

∑=

=≤x

xX kfxXP

0

),()( cdf for Binomial, Poisson, and hypergeometric distributions


∑=

=≤x

xX kfxXP

1

),()( cdf for geometric distribution

We will define the following functions for the distributions shown earlier:

pmf CDFBinomial b(x,n,p) B(x,n,p)Poisson p(x,lambda) P(x,lambda)geometric g(x,p) G(x,p)hypergeometric h(x,N,n,a) H(x,N,n,a)

The following is a SCILAB script, called DiscreteProbabilityFunctions, which includes thedefinitions for the eight function calls listed in the table immediately above:

//Defining discrete probability distributionsdeff('[CC]=C(n,r)','CC=gamma(n+1)./(gamma(r+1).*gamma(n-r+1))') //Binomial coefficientdeff('[bb]=b(x,n,p)','bb=C(n,x).*p.^x.*(1-p).^(n-x)') //Binomial pmfdeff('[BB]=B(x,n,p)','BB=sum(b([0:1:x],n,p))') //Binomial CDFdeff('[pp]=p(x,lambda)','pp=exp(-lambda).*lambda^x./gamma(x+1)') //Poisson pmfdeff('[PP]=P(x,lambda)','PP=sum(p([0:1:x],lambda))') //Poisson CDFdeff('[gg]=g(x,p)','gg=p.*(1-p).^(x-1)') //Geometric pmfdeff('[GG]=G(x,p)','GG=sum(g([1:x],p))') //Geometric CDFdeff('[hh]=h(x,N,n,a)','hh=C(a,x).*C(N-a,n-x)./C(N,n)') //Hypergeometric pmfdeff('[HH]=H(x,N,n,a)','HH=sum(h([0:1:x],N,n,a))') //Hypergeometric CDF

To execute the script that defines the discrete probability functions use:

-->exec('DiscreteProbabilityFunctions')

Combinations

The function C(n,r) represents combinations of n elements taken r by r, or the binomialcoefficient:

-->C(10,5) ans =

252.

This is a vector of values of C(n,r) for n = 10, and r = 0,1, …, 10:

-->C10=[];for j=0:10,C10=[C10 C(10,j)]; end; C10 C10 =

! 1. 10. 45. 120. 210. 252. 210. 120. 45. 10.1. !

Binomial distribution

For the binomial distribution with n = 10 and p = 0.25, the following call to function b(x,n,p)calculates the probability P(X=2) = b(2,10,0.25):

-->b(2,10,0.25) ans =


.2815676

The following is a list of values of the binomial pmf for n = 10, p = 0.25, for all possible valuesof x = 0,1, …, 10:

-->b10=[];for j=0:10,b10=[b10 b(j,10,0.25)]; end; b10 b10 =

column 1 to 7

! .0563135 .1877117 .2815676 .2502823 .145998 .0583992.016222 !

column 8 to 11

! .0030899 .0003862 .0000286 9.537E-07 !

The binomial CDF for x = 2, n = 10, p = 0.25 is calculated with the following call to functionB(x,n,p). This value represents P(X≤2):

-->B(2,10,0.25) ans =

.5255928

This value represents P(X>2) = 1 - P(X≤2):

-->1-B(2,10,0.25) ans =

.4744072

The following is a list of values of the binomial CDF for n = 10, p = 0.25, for all values of x =0,1, …, 10:

-->B10=[];for j=0:10,B10=[B10 B(j,10,0.25)]; end; B10 B10 =

column 1 to 7

! .0563135 .2440252 .5255928 .7758751 .9218731 .9802723.9964943 !

column 8 to 11

! .9995842 .9999704 .9999990 1. !

Poisson distribution

The pmf of the Poisson distribution can be used to calculate probabilities such as P(X=2) for λ =5.2:

-->p(2,5.2) ans =

.0745840


For P(X=6), the Poisson distribution with for λ = 5.2 produces:

-->p(6,5.2) ans =

.1514803

The cumulative distribution function for the Poisson distribution, with for λ = 5.2, provides theprobability P(X≤6):

-->P(6,5.2) ans =

.7323933

The following SCILAB commands produce a vector of values of the Poisson cdf for x = 0, 1, …,10, and λ = 5.2:

-->P10=[];for j=1:10, P10=[P10 P(j,5.2)]; end; P10 P10 =

column 1 to 7

! .0342027 .1087867 .2380655 .406128 .580913 .7323933.8449216 !

column 8 to 10

! .9180650 .9603256 .9823011 !

Geometric distribution

The probabilities P(X=3) and P(X=5) using the geometric distribution with p = 0.50 arecalculated as:

-->g(3,0.50) ans =

.125

-->g(5,0.50) ans =

.03125

The following example shows a way to calculate a vector of values of the geometricdistribution pmf for x = 1, 2, …, 10:

-->g([1:10],0.5) ans =

column 1 to 9

! .5 .25 .125 .0625 .03125 .015625 .0078125.0039063 .0019531 !

column 10


! .0009766 !

The following evaluations of the geometric distribution cdf are used to calculate theprobabilities P(X6), P(X3), and P(X1), respectively:

-->G(6,0.5) ans =

.984375

-->G(3,0.5) ans =

.875

-->G(1,0.5) ans =

.5

A vector of values of the geometric distribution CDF, with p = 0.5, is produced by using thefollowing commands:

-->G10=[];for j=1:10, G10=[G10 G(j,0.5)]; end; G10 G10 =

column 1 to 9

! .5 .75 .875 .9375 .96875 .984375 .9921875.9960938 .9980469 !

column 10

! .9990234 !

Hypergeometric distribution

The next line assign values to the parameters N, n, and a in the hypergeometric distribution:

-->N=100;n=20;a=35;

The probability P(X=12) for the hyperbolic distribution with the parameters N, n, and a definedabove is calculated as:

-->h(12,N,n,a) ans =

.0078581

The cumulative distribution function for the hypergeometric distribution for x = 12 iscalculated as follows:

-->H(12,N,n,a) ans =

.9976693


The value just calculated represents the probability P(X≤12). The next statement generates avector of values of the hypergeometric pdf for x = 0, 1, 2, …, 20:

-->h([0:20],N,n,a) ans =

column 1 to 7

! .0000529 .0008046 .0055295 .0228093 .0633073 .1256018.1847085 !

column 8 to 14

! .2060210 .1768671 .1179114 .0613139 .0248839 .0078581.0019176 !

column 15 to 21

! .0003575 .0000501 .0000051 3.698E-07 1.761E-08 4.924E-106.060E-12 !

The next line produces a vector of values of the hypergeometric CDF:

-->H10=[];for j=1:10,H10=[H10 h(j,N,n,a)]; end; H10 H10 =

column 1 to 7

! .0008046 .0055295 .0228093 .0633073 .1256018 .1847085.2060210 !

column 8 to 10

! .1768671 .1179114 .0613139 !

Continuous probability functions

In this section we describe several continuous probability distributions including the gamma,exponential, beta, and Weibull distributions. Some of these distributions make use of theGamma function, Γ(x), which is defined next.

__________________________________________________________________________________

Factorials and the Gamma function (see also Chapter 13)

The Gamma function is defined by

This function has the property that ,

Γ(α) = (α-1) Γ(α−1), for α > 1,

therefore, it can be related to the factorial of a number, i.e.,

dxex x−∞ −∫=Γ0

1)( αα


Γ(α) = (α-1)!,

when α is a positive integer.

Factorials have applications in combinatorics (calculation of combinations and permutations,etc.), and in some discrete probability distributions (e.g., binomial probability distribution),while the gamma function has applications in continuous probability distributions (e.g., thegamma probability distribution.)

__________________________________________________________________________________

The gamma distribution

The probability distribution function (pdf) for the gamma distribution is given by

The parameters α and β are referred to, respectively, as the shape and scale parameters of thegamma distribution. Other parameters of this distribution are:

2, βασβαµ ⋅=⋅= XX .

SCILAB provides function cdfgam for operations with the gamma distribution CDF. The calls tothis function take the form

[P,Q]=cdfgam("PQ",X,Shape,Scale) [X]=cdfgam("X",Shape,Scale,P,Q) [Shape]=cdfgam("Shape",Scale,P,Q,X) [Scale]=cdfgam("Scale",P,Q,X,Shape)

where P = P(XX<X), Q = 1- P, Shape = α, and Scale = β, with XX ~ gamma(α,β).

The following are examples of applications of function cdfgam. The following three callsdetermine, respectively, the probabilities P = P(X<10), P = P(X<3), and P = P(X<0.5), as well asthe probabilities of the complement, Q = 1 - P, for the gamma distribution with α = 2, β = 3:

-->[P,Q]=cdfgam("PQ",10,2,3) Q =

2.901E-12 P =

1.

-->[P,Q]=cdfgam("PQ",3,2,3) Q =

.0012341 P =

.9987659

;0,0,0),exp()(

1)( 1 >>>−⋅⋅Γ

= − βαβαβ

αα xforxxxf


-->[P,Q]=cdfgam("PQ",0.5,2,3) Q =

.5578254 P =

.4421746

The next call to function cdfgam calculates the inverse gamma function, i.e., the value of x forP = P(X<x) where X follows the gamma distribution with α = 2, β = 3:

-->x=cdfgam('X',2,3,0.4,0.6) x =

.4588071

The next call to the function is used to calculate the shape parameter, α, given a probability P= P(X<0.3) = 0.6, Q = 1-P = 0.4, with X following the gamma distribution with a scale parameterβ = 2:

-->alpha = cdfgam('Shape',2,0.6,0.4,0.3) alpha =

.7190660

The next call to function cdfgam calculates the scale parameter, β, given a probability P =(X<1.2) = 0.2, Q = 1-P = 0.8, with X following the gamma distribution with α = 3:

-->beta = cdfgam('Scale',0.2,0.8,1.2,3) beta =

1.2792035

The exponential distribution

The exponential distribution is the gamma distribution with α = 1. Its pdf is given by

While its cdf is given by

FX(x) = 1 - exp(-x/β), for x>0, β >0.

Parameters of the exponential distribution include:

.1,1β

σβ

µ == XX

The beta distribution

;0,0),exp(1)( >>−⋅= βββ

xxxf X


The pdf for the beta distribution is given by

As in the case of the gamma distribution, the corresponding cdf for the beta distribution isalso given by an integral with no closed-form solution.

The parameters of the beta distribution include

.))(1()(, 2βαβα

βαβα

αµ+++

⋅=+

= XVarX

SCILAB provides function cdfbet for operations with the cumulative distribution function of thebeta distribution. Calls to the function are the following:

[P,Q]=cdfbet("PQ",X,Y,A,B)[X,Y]=cdfbet("XY",A,B,P,Q)[A]=cdfbet("A",B,P,Q,X,Y)[B]=cdfbet("B",P,Q,X,Y,A)

In these calls P = P(XX<X), Y = 1 - X, Q = 1 - P, A, B are the parameters α and β of the betadistribution.

Next, we present some applications of function cdfbet. The first example calculate theprobability P(X<0.35) for the beta distribution with α = 2, β = 3:

-->[P,Q]=cdfbet('PQ',0.35,1-0.35,2,3) Q =

.5629813 P =

.4370187

An example that calculates the inverse function of the beta cdf, i.e., the value of x for which P= P(X<x) = 0.75, for the beta distribution with α = 3, β = 5 is presented next:

-->[X,Y] = cdfbet("XY",3,5,0.75,1-0.75) Y =

.5139030 X =

.4860970

The next two examples shows how to obtain the parameters a and b from the beta distributiongiven values of X = 0.3, Y = 1-X = 0.7, P = P(X<0.3) = 0.4, and Q = 1-P = 0.6. In the firstapplication β = 3.5, while in the second application α = 1.5:

-->alpha = cdfbet("A",3.5,0.4,0.6,0.3,0.7) alpha =

;0,0,10,)1()()(

)()( 11 >><<−⋅⋅Γ⋅Γ+Γ= −− βα

βαβα βα xxxxfX


2.0459494

-->beta = cdfbet("B",0.6,0.4,0.8,0.2,1.5) beta =

.7453948

The Weibull distribution

The pdf for the Weibull distribution is given by

While the corresponding cdf is given by

Parameters of this distribution are:

+Γ−+Γ=+Γ⋅= −− )11()21()(),11( 2/2/1

ββα

βαµ ββ XVarX .

The uniform distribution

The uniform distribution for a continuous random variable is defined for values of X such that a<x<b. The corresponding probability density function is given by

bxaab

xf X <<−

= ,1)(

The cumulative distribution function is

bxaabaxxFX <<

−−= ,)(

The parameters of the uniform distribution are:

.12

)()(,2

2abXVarbaX

−=+=µ

The following function definition implements the cumulative distribution function for theuniform distribution in SCILAB:

-->deff('[FF]=FX(x)','FF=(x-a)/(b-a)')

For values of a = 2.5 and b = 3.2, we proceed to calculate some probabilities:

0,0,0),exp()( 1 >>>⋅−⋅⋅⋅= − βααβα ββ xforxxxf

0,0,0),exp(1)( >>>⋅−−= βαα β xforxxF


--> a = 2.5; b = 3.2;

First, we calculate P(X<2.7) = FX(2.7):

-->FX(2.7) ans =

.2857143

Next, we calculate P(X>3) = 1 - P(X<3) = 1 - FX(3):

-->1-FX(3) ans =

.2857143

The following example calculates P(2.8<X<3) = P(X<3) - P(X<2.8) = FX(3) - FX(2.8):

-->FX(3)-FX(2.8) ans =

.2857143

User-defined functions for continuous probability distributions

The following SCILAB script defines the probability density function and the cumulative densityfunction for four selected continuous distributions: gamma, exponential, beta, and Weibull.The script is called ContinuousProbabilityFunctions, and is invoked by using:

-->exec('ContinuousProbabilityFunctions')

The listing of the script is the following:

//Define selected continuous probability functionsdeff('[gg]=gam(x,a,b)','gg=x.^(a-1).*exp(-x./b)./(b.^a.*gamma(a))')deff('[GG]=GAM(x,a,b)','GG=intg(0,x,gam)')deff('[ee]=eex(x,b)','ee=exp(-x./b)./b')deff('[EE]=EEX(x,b)','EE=1-exp(-x./b)')deff('[bb]=bet(x,a,b)',... 'bb=gamma(a+b).*x.^(a-1).*(1-x).^b./(gamma(a).*gamma(b))')deff('[BB]=BET(x,a,b)','BB=intg(0,x,bet)')deff('[ww]=w(x,a,b)','ww=a.*b.*x^(b-1).*exp(-a.*x.^b)')deff('[WW]=W(x,a,b)','WW=1-exp(-a.*x.^b)')

The functions defined through the script are summarized in the following table:

pdf CDFgamma gam(x,α,β) GAM(x,α,β)exponential eex(x,β) EEX(x,β)beta bet(x,α,β) BET(x,α,β)Weibull w(x,α,β) W(x,α,β)

Applications of these functions follow, starting with the gamma distribution.

The gamma distribution

First, we plot the pdf of the distribution using α = 2 and β = 3:


-->xx=(0:0.1:20);yy=gam(xx,2,3);

-->plot(xx,yy,'x','fX(x)','gamma distribution')

A plot of the gamma distribution CDF for α = 2 and β = 3 is obtained by using:

-->yyy=[];for x=0:0.1:20, yyy=[yyy GAM(x,2,3)]; end;

-->plot(xx,yyy,'x','FX(x)','gamma distribution')

The CDF can be used to calculate probabilities. The next three lines calculate the followingprobabilities P(X<5) = FX(5), P(6<X<11) = FX(11) - FX(5), and P(X>7.5) = 1 - P(X<7.5) = 1 - FX(7.5):

-->GAM(5,2,3) ans = .4963317

-->GAM(11,2,3)-GAM(6,2,3) ans = .2867187

-->1-GAM(7.5,2,3) ans = .2872975

The exponential distribution


The following commands generate plots of the pdf and CDF for the exponential distributionusing β = 2.5:

-->xx=(0:0.1:20);yy=eex(xx,2.5);-->plot(xx,yy,'x','fX(x)','exponential distribution')

-->yyy=[];for x=0:0.1:20, yyy=[yyy EEX(x,2.5)]; end;

-->plot(xx,yyy,'x','FX(x)','exponential distribution')

The following probability calculations for the exponential distribution with β = 2.5 arepresented next: P(X<6) = FX(6), P(X>4) = 1 - P(X<4) = 1 - FX(4), and P(4<X<6) = FX(6)-FX(4):

-->EEX(6,2.5) ans = .9092820

-->1-EEX(4,2.5) ans = .2018965

-->EEX(6,2.5)-EEX(4,2.5) ans = .1111786

The beta distribution


To plot the pdf and CDF of the beta distribution with α = 2.5, β = 3.5, we use:

-->xx=(0:0.05:1);yy=bet(xx,2.5,3.5);

-->plot(xx,yy,'x','fX(x)','beta distribution')

-->yyy=[];for x=0:0.05:1, yyy=[yyy BET(x,2.5,3.5)]; end;

-->plot(xx,yyy,'x','FX(x)','beta distribution')

The following probability calculations for the beta distribution with β = 2.5 are presented next:P(X<0.25) = FX(0.25), P(X>0.75) = 1 - P(X<0.75) = 1 - FX(4), and P(0.3<X<0.8) = FX(0.8)-FX(0.3):

-->BET(0.25,2.5,3.5) ans = .1737696

-->1-BET(0.75,2.5,3.5) ans = .4250376

-->BET(0.8,2.5,3.5)-BET(0.3,2.5,3.5) ans = .3428804

The Weibull distribution


Plots of the pdf and CDF for the Weibull distribution with α = 2 and β = 3 are obtained asfollows:

-->xx=(0:0.01:2);yy=w(xx,2,3);-->plot(xx,yy,'x','fX(x)','Weibull distribution')

-->yyy=[];for x=0:0.01:2, yyy=[yyy W(x,2,3)]; end;

-->plot(xx,yyy,'x','FX(x)','Weibull distribution')

The following probability calculations for the Weibull distribution with α = 2 and β = 3 arepresented next: P(X<1.5) = FX(1.5), P(X>0.6) = 1 - P(X<0.6) = 1 - FX(4), and P(0.5<X<1.2) =FX(0.8)-FX(0.3):

-->W(1.5,2,3) ans = .9988291

-->1-W(0.6,2,3) ans = .6492094

-->W(1.2,2,3)-W(0.5,2,3) ans = .7472451


Continuous probability distributions used in statistical inference

Statistical inference is the process by which sample data is used to provide information aboutthe population. Some of the products of statistical inference are the generation of confidenceintervals and the test of hypotheses for population parameters. There are a number ofcontinuous probability distributions of great utility in statistical inference. These are:

the standard normal distributionthe Student’s t distributionthe Chi-square (χ2) distributionthe F distribution

The probability density functions (pdf) for these distributions are presented below:

The Normal distribution

The expression for the normal distribution pdf is:

where µ is the mean, and σ2 the variance of the distribution.

SCILAB provides function cdfnor for operations with the cumulative distribution function for thenormal distribution. Function cdfnor was presented in detail in Chapter …. To find on-lineinformation on this function use the command:

-->help cdfnor

The Student-t distribution

The Student-t, or simply, the t-, distribution has one parameter ν, known as the degrees offreedom. The probability density function (pdf) is given by

The following SCILAB commands can be used to plot the pdf for the Student t distribution with

-->deff('[f]=fT(t,nu)',...-->'f=gamma((nu+1)./2).*(1+t.^2./nu).^(-(nu+1)/2)/(sqrt(%pi*nu)*gamma(nu/2))')

-->tt=[-4:0.1:4];ff=fT(tt,6);

-->plot(tt,ff,'t','fT(t)','Student t - nu = 6')

∞<<−∞+⋅⋅Γ

+Γ=

+−

tttf ,)1()

2(

)2

1()( 2

12 ν

νπνν

ν

],2

)(exp[2

1)( 2

2

σµ

πσ−−= xxf


SCILAB provides function cdft for operations with the cumulative distribution function of theStudent’s t distribution. The calls to the function are as follows:

[P,Q]=cdft("PQ",T,Df) [T]=cdft("T",Df,P,Q) [Df]=cdft("Df",P,Q,T)

In these function calls, P = P(TT<T), Q = 1 - P, Df = degrees of freedom = ν, with TT ~ Studentt(Df).

-->[P,Q] = cdft("PQ",0.4,6) //Probability calculation Q =

.3515041 P =

.6484959

-->t = cdft("T",8,0.45,1-0.45) //Inverse CDF calculation t =

- .1297073

-->nu = cdft("Df",0.7,0.3,0.8) //Obtaining degrees of freedom nu =

.7716700

A plot of the CDF for the Student t distribution can be produced using the following commands:

-->xx=[-4:0.1:4];

-->yy=[];for x=-4:0.1:4, yy=[yy cdft('PQ',x,6)]; end;

-->plot(xx,yy,'t','fX(t)','Student t - nu = 6')


The Chi-squared (χ2) distribution

The Chi-squared (χ2) distribution has one parameter ν, known as the degrees of freedom. Theprobability distribution function (pdf) is given by

A plot of the pdf for the Chi-square distribution with ν = 10 can be obtained by using:

-->xx = [0:0.1:10];

-->yy=[];for x=0:0.1:10, yy=[yy cdfchi('PQ',x,4)]; end;

-->plot(xx,yy,'t','fX(t)','Chi-square - nu = 4')

SCILAB provides function cdfchi for operations with the cumulative distribution function of theχ2

(chi-square) distribution. The calls to this function include:

[P,Q]=cdfchi("PQ",X,Df)[X]=cdfchi("X",Df,P,Q);[Df]=cdfchi("Df",P,Q,X)

0,0,)

2(2

1)( 21

2

2

>>⋅⋅Γ⋅

=−−

xexxfx

νν

ν

ν


In these calls to function cdfchi P = P(XX<X), Q = 1 - P, Df = degrees of freedom, with XX ~ χ2

(Df).

-->[P,Q] = cdfchi("PQ",1,10) //Probability calculation Q =

.9998279 P =

.0001721

-->[P,Q] = cdfchi("PQ",0.2,10) //Probability calculation Q =

.9999999 P =

7.668E-08

-->chi2 = cdfchi("X",4,0.4,0.6) //Inverse CDF calculation chi2 =

2.7528427

-->nu = cdfchi("Df",0.4,0.6,2.7) //Calculating degrees of freedom nu =

3.9409085

A plot of the CDF for the Chi-square distribution with n = … is obtained by using:

-->deff('[f]=fC(x,nu)',...-->'f=x.^(nu/2-1).*exp(-x./2)/(2.^(nu/2).*gamma(nu./2))')

-->cc=[0:0.1:30];ff=fC(cc,10);-->plot(cc,ff,'chi^2','fC(chi^2)','Chi-square - nu = 10')

The F distribution

The F distribution has two parameters νN = numerator degrees of freedom, and νD =denominator degrees of freedom. The probability distribution function (pdf) is given by


νD>0, νN>0, x>0.

A plot of the F-distribution pdf for nN = 4, nD = 6, is obtained by using:

-->deff('[f]=fF(F,nuN,nuD)',...-->'f=gamma((nuN+nuD)./2).*(nuN./nuD).^(nuN./2).*F.^(nuN./2-1)./...-->(gamma(nuN./2).*gamma(nuD./2).*(1+nuN.*F./nuD).^((nuN+nuD)./2))')-->xx=[0:0.1:10];ff=fF(xx,4,6);-->plot(xx,ff,'F','fF(F)','F distribution - nuNum = 4 - nuDen = 6')

SCILAB provides the function cdff for operations with the cumulative distribution function ofthe F distribution.

[P,Q]=cdff("PQ",F,Dfn,Dfd)[F]=cdff("F",Dfn,Dfd,P,Q);[Dfn]=cdff("Dfn",Dfd,P,Q,F);[Dfd]=cdff("Dfd",P,Q,F,Dfn)

In these calls of the function cdff, P = P(FF<F), Q = 1 - P, Dfn and Dfd = degrees of freedom inthe numerator and denominator of F.

-->[P,Q] = cdff("PQ",1.2,6,12) //Probability calculation Q = .3697351 P = .6302649

-->F = cdff("F",10,2,0.4,0.6) //Inverse CDF calculation F =

.9944093

-->nuNum= cdff('Dfn',5,0.4,0.6,0.8) //calculating degrees of freedom nuNum =

5.3847039

)2

(

122

)1()2

()2

(

)()2

()( DN

NN

DFNDN

FDNDN

xf νν

νν

νννν

νννν

+

−

⋅+⋅Γ⋅Γ

⋅⋅+Γ=


A plot of the F-distribution CDF is produced through the following SCILAB commands:

-->xx = [0:0.1:10];

-->yy=[];for x=0:0.1:10, yy=[yy cdff('PQ',x,4,6)]; end;

-->plot(xx,yy,'t','fX(t)','F - nuNum = 4 - nuDen = 6')

Applications of the normal distribution in dataanalysisThe normal distribution, also known as the bell curve, appears commonly when determiningthe frequency distribution of different types of physical measurements. We first introducedthe normal distribution in Chapter 14 as an example of a continuous probability distribution. Inthis section we present some applications of this probability distribution in data analysis.The probability density function, pdf, for a general normal distribution, X, with a mean value,µ, and a standard deviation, σ, is given by

.,0,2

)(exp2

1)( 2

2

∞<<∞−>

−−⋅= xxxf X σσ

µπσ

The standard normal distribution has mean value µ = 0 and standard deviation σ = 1.SCILAB provides function cdfnor for operations with the normal cumulative distributionfunction. The different forms of the call to the function were presented in detail in Chapter$,and are repeated here:[p,q] = cdfnor(“PQ”,x,mu,sigma)[x] = cdfnor(“X”,mu,sigma,p,q)[mu] = cdfnor(“Mean”,sigma,p,q,x)[sigma] = cdfnor(“Std”,p,q,x,mu)

where mu is the mean value (m), sigma is the standard deviation (s), p = P(X<x), and q = 1 - p =P(X>x). The first argument in the different calls to cdfnor is a string that indicates the type ofresult expected:

“PQ” - to request probabilities p and q“X” - to request a value of the normal variable“Mean” - to request the mean of the distribution


“Std” - to request the standard deviation of the distribution

Because the normal distribution is commonly found in the analysis of physical measurements, itif often recommended that you check if your data set (your sample) follows the normaldistribution. In this section we present two graphical approaches for checking if your datafollows the normal distribution. The first consists of superimposing a normal distribution pdf,based on the mean value and standard deviation of the sample, on top of the samplehistogram. The second approach consists in plotting the data against what is commonly knowntheir normal scores. The resulting graph is equivalent to plotting the data in a normalprobability paper, i.e., a paper with one scale representing the normal probabilitycorresponding to the data set. These two approaches are described next.Plotting a histogram and its corresponding normal curve

The purpose of this plot is to visually check if the histogram of a sample, with a suitablenumber of classes, matches a superimposed normal curve. For that purpose we propose thefollowing SCILAB user-defined function, histnorm:function [chi2,cmark,fcount]=histnorm(x, xclass)

//This function calculates the frequency distribution//for the data in (row) vector x according to the//class boundaries contained in the (row) vector//xclass. It also produces a histogram of the//data and the normal curve that best fit the data.////Typical call: [chi2,cm,f] = freqdist(x,xclass)//where cm = class marks, f = frequency count,// chi2 = chi-square parameter for the fitting

[m n] = size(x); //Sample size[m nB] = size(xclass); //Number of class boundariesk = nB - 1; //Number of classes

//Calculate class markscmark = zeros(1,k);for ii = 1:k cmark(ii) = 0.5*(xclass(ii)+xclass(ii+1));end

//Initialize frequency counts to zerofcount=zeros(1,k);fbelow=0; fabove=0;

//Accumulate frequency countsfor ii = 1:n if x(ii) < xclass(1) fbelow = fbelow + 1; elseif x(ii) > xclass(nB) fabove = fabove + 1; else for jj = 1:k if x(ii)>= xclass(jj) & x(ii)< xclass(jj+1) fcount(jj) = fcount(jj) +1; end end endend

//define normal CDF, calculate xbar, sx, chi-square parameternn = sum(fcount);xbar = mean(x); sx = st_deviation(x);xmin = min(xclass); xmax = max(xclass);


pk = [];for j = 1:k+1

pk = [pk cdfnor("PQ",xclass(j),xbar,sx)];end;p_in_classes = pk(k+1)-pk(1);pxclass = pk(2:k+1) - pk(1:k);fc = pxclass*nn*p_in_classes;//Chi square parameterchi2=0;for j = 1:length(fc) chi2 = chi2 + (fcount(j)-fc(j))^2/fc(j);end;

//Produce normal distribution for dataDx = (xmax-xmin)/100;xx = [xmin:Dx:xmax];xxx = xx(1:100) + Dx/2;pkk = [];for j = 1:101

pkk = [pkk cdfnor("PQ",xx(j),xbar,sx)];end;pp = pkk(2:101) - pkk(1:100);fcc = pp*p_in_classes*nn*100/k;

//Determine plot rectangleymin = 0;ymaxf = max(fcount); ymaxy = max(fcc);ymax = max(ymaxf,ymaxy);ymax = int(1.1*ymax);plotrectangle = [xmin ymin xmax ymax];

//plot the histogram and normal curvexp = xclass(1:k);xset('window',1);xbasc(1);plot2d2('onn',xclass',[fcount fcount(k)]',[1],'011','y',[xmin ymin xmax ymax]);plot2d3('onn',xp',fcount',[1],'000');plot2d(xxx',fcc',[2],'000');xtitle('Histogram with normal curve','x','frequency');

//end function histnorm

Notice that this function uses SCILAB function cdfnor to calculate values of the cumulativedistribution function for the normal distribution where needed. The general call to thefunction is:

[chi2,cm,f] = freqdist(x,xclass)which returns, in general, the class marks, cm, the frequency count, f, and a chi-squareparameter defined as

∑=

−=

k

i i

ii

fcfcf

1

22 ,

)(χ

where fi is the actual frequency count for the ith class, fci is the estimated frequency countobtained from the normal distribution for the ith class, and k is the number of classes in thefrequency distribution.The χ2 parameter follows the chi-square distribution with ν = k-1 degrees of freedom, and it isused to check the hypothesis that the frequency distribution under consideration followsindeed the normal distribution. The subject of hypothesis testing is developed in Chapter …,therefore, we delay until then the use of the parameter returned from function histnorm.


Application of the function histnormIn this example we apply function histnorm to a set of 200 data values between 0 and 100generated using function rand as follows:

-->x = int(100*rand(1,200));

First, we check the minimum and maximum value of the data:

-->min(x), max(x) ans = 0. ans = 99.

A set of class boundaries of 0, 10, 20, …, 100, will produce 10 classes for this sample:

-->xclass = [0:10:100];

Next, we load the function histnorm and apply the function to the data stored in x using theclass boundaries stored in xclass

-->getf(‘histnorm’)

-->histnorm(x,xclass) ans = 1.9583514

The value returned is the chi-square parameter for the normal curve fitting. The plot of thehistogram with the super-imposed normal curve is:

A second example for the same data sample is presented next in which we use 20 classes, withclass boundaries 0, 5, 10, …, 95, 100, to classify the data: -->xclass=[0:5:100];

The results from function histnorm are the chi-square parameter and the following plot:-->histnorm(x,xclass) ans = 2.0146916


The function can be invoked with a vector of three values in the left-hand side to produce notonly the chi-square parameter and the plot, but also the class marks and the frequency countof the sample:-->[X2,cm,f] = histnorm(x,[0:10:100]) f =

column 1 to 9 ! 20. 18. 27. 18. 23. 22. 16. 18. 14. !

column 10

! 24. ! cm = ! 5. 15. 25. 35. 45. 55. 65. 75. 85. 95. ! X2 = 1.9583514

Notice that in the two graphs shown above, the normal curve does not fit the histograms verywell. The main reason is that the data was generated from an uniform distribution (i.e., usingthe default settings of SCILAB’s function rand) and not from a normal distribution. Later inthis chapter we deal with the generation of data other than from an uniform distribution, andwill be using function histnorm to check how well those data fit the normal distribution.

Plotting data against their normal scoresAssume that the continuous random variable X follows the normal distribution with mean µ andstandard deviation σ. Given a probability p (0<p<1) such that P(X<x)=p with X ~ N(µ,σ), thenthe value of x is referred to as the normal score for p. [Note: In some references in thestatistical literature the normal scores are related to a probability α = 1 - p, so that if P(X>xα) =α, with X ~ N(µ,σ), xα is the normal score for α.]Suppose that we have an ordered data set, xp = {xp1<xp2< …<xpn} that follows the normaldistribution with mean and standard deviations equal to the sample’s mean ( x) and standarddeviation (sx). Also, assume that the probability of the interval [xpi, xpi+1] is the same for allvalues of i = 1, 2, …, n-1, say P(xpi<X<xpi+1) = q. Also, assume that P(X<xp1) = P(X>xpn) = q.Thus, the entire area under the normal curve is split into n+1 sub-regions of the same area q asillustrated in the figure below.


The value of q is, therefore, q = 1/(n+1), and we can write:P(X<xp1) = q, P(X<xp2) = 2q, …, P(X<xpi) = iq, …, P(X<xpn) = nq.

In general,P(X<xpi) = i/(n+1) = pi,

for of i = 1, 2, …, n. The values pi are referred to as plotting positions for they are used toobtain the normal scores corresponding normal score xpi.Given an ordered data set, x = {x1 < x2 < … < xn}, of size, n, we can generate a vector ofplotting positions, pi = i/(n+1), and obtain a set of normal scores xpi, by using the function callcdfnor(“X”, x,sx,pi,1-pi), where x and sx are the mean and standard deviation of the data set.If the given data set, x, does indeed follow the normal distribution with mean µ = x, andstandard deviation σ = sx, a plot of normal scores xp versus the original data x should produce astraight line.

A function to produce a plot of data versus normal scores

The following function, normplot, takes as input a data set, or sample, x = {x1, x2, …, xn},orders it in increasing order, obtains the plotting positions pi, calculates the normal scores xpi,and plots the normal scores versus the ordered data. It also plots a straight line representingy=x, or the exact fitting for a normal distribution. The closer the plot of normal scores vs.data is to the straight line representing the exact fitting for a normal distribution, the closerthe data set follows the normal distribution.

function normplot(x)//This function produces a normal probability//paper plot for the data in (row) vector x

xx = sortup(x); //order sample in increasing orderxm = mean(xx); //mean of samplesx = st_deviation(xx); //standard deviation of samplenn = length(x); //sample size

//Calculating plotting positions and normal scorespp = []; xp = [];for j = 1:nn pp = [pp j/(nn+1)]; xp = [xp cdfnor(“X”,xm,sx,pp(j),1-pp(j))];end;


//Determine the plotting rectangle xmin1 = min(xx); xmin2 = min(xp); xmin =min(xmin1,xmin2); xmax1 = max(xx); xmax2 = max(xp); xmax = max(xmax1,xmax2);ymin = min(xp); ymax = max(xp);//Produce a graduated scale[xminp, xmaxp, nxp] = graduate(xmin,xmax);[yminp, ymaxp, nyp] = graduate(ymin,ymax);

//Plot scores vs. data and exact normal distribution fittingplot2d(xp’,xp’,[ 1],’011’,’y’,[xminp yminp xmaxp ymaxp])xset(‘mark’,-9,2);plot2d(xx’,xp’,[-9],’011’,’y’,[xminp yminp xmaxp ymaxp])xtitle(‘Normal probability plot’,’x’,’z’);//end function normplot

An application of this function is shown next. First, we produce a sample of 200 data pointsusing a uniform distribution. Next, we load function normplot and produced the normalprobability plot.

-->x =int(100*rand(1,200));

-->getf(‘normplot’)

-->normplot(x)

The resulting graph shows that the data does not follow the normal distribution particularlynear the lowest and highest values of the data set.

The lognormal distributionIf the random variable Y = ln(X) follows the normal distribution with mean µY = µln(X) andstandard deviation σY = σln(X), then we say that the random variable X follows the lognormaldistribution. The probability density function of the lognormal distribution is given by

.0,2

)(lnexp

21)( 2

)ln(

2)ln(

)ln(

>

−−⋅= x

xx

xfX

X

XX σ

µπσ

with


( ) ).2exp(1)exp()exp()(,21exp )ln(

2)ln(

2)ln(

2)ln()ln( XXXXXX XVar µσσσµµ −=

⋅+=

For calculating probabilities we can use the normal distribution cdf by first calculating thenatural log of the variable, for example, if X~lognormal(µln(X)=1.2, σln(X)=0.5), to calculate theprobability P(X<2) use P(X<2) = P(ln(X)<ln(2)) = P(Y<0.6931) where Y ~ N(1.2, 0.5). We canuse function cdfnor to calculate this probability in SCILAB as follows:

-->cdfnor(“PQ”,log(2),1.2,0.5) ans = .1553616

Suppose that we want to find the inverse cumulative distribution function, i.e., a value of Xfor which P(X<x) = 0.35, given µln(X)=1.2, σln(X)=0.5, we can use:

-->cdfnor(“X”,1.2,0.5,0.35,0.65) ans = 1.0073398

The previous result actually gives a value of Y = ln(X) with Y ~ N(1.2, 0.5). The correspondingvalue of X is calculated as X = exp(Y), i.e.,

-->exp(ans) ans = 2.7383068

A graph of the lognormal probability density function for µln(X)=1.2, σln(X)=0.5 is produced byusing:

-->deff(‘[ff]=fX(x,mu,sigma)’,...-->‘ff=exp(-(log(x)-mu).^2./(2.*sigma.^2))./(sigma.*x.*sqrt(2.*%pi))’) -->mu=1.2;sigma=0.5;xx=[0.01:0.1:10];yy=fX(xx,mu,sigma); -->plot(xx,yy,’x’,’fX(x)’,’Log-normal pdf’)


Generating synthetic dataIn this section we present pre-defined and user-defined functions that allows us to generatedata that follows a particular probability distribution. We refer to such data as syntheticdata.

Generating normally-distributed synthetic dataIn the examples presented in the previous section on applications of the normal distribution wegenerated data by using the function rand, which, by default, produces random data uniformlydistributed in the interval [0,1]. The function rand can also be used to produce normallydistributed data, z, that follows the standard normal distribution, i.e., Z ~ N(0,1), by, first,using the function call

rand(‘normal’)

and next using the function call

rand(n,m)

where n and m are integers. The last call to function rand will produce a matrix of n rows andm columns whose elements are random numbers following the standard normal function.Recalling that the standardized normal variate is defined as

Z = (X-µ)/σ,

values of x can be obtained from values of z by using

x = µ + σz.

The following example illustrate how to use function rand to produce 200 data points thatfollow the normal distribution with mean µ = 150, and standard deviation σ = 50:

-->x = 150 + 50.*rand(1,200);

To verify that the data do indeed follow the normal distribution, we use functions histnorm andnormplot applied to this data set. To use function histnorm, we first determine the minimumand maximum values of the data set to determine which class boundaries use in the histogram:

-->xmin = min(x), xmax = max(x) xmin = 34.558873 xmax = 317.59609

We select for class boundaries the values 25, 50, 75, …, 300, 325:

-->xclass = [25:25:325];

The resulting histogram and superimposed normal curve are shown next:

-->histnorm(x,xclass);


The fitting of the histogram to the corresponding normal curve is relatively good, in spite ofthe apparent discrepancy towards the center of the data. We can also use function normplotto check the normality of the data as follows:

-->normplot(x)

The resulting normal probability plot is:

The plot suggests that the data follows the normal distribution for most of the range except forvalues larger than about 220.

Additional applications of function randSCILAB’s function rand, as most numerical random number generators, uses a number, knownas the seed, to produce random numbers. To find out the current value of the seed infunction rand use:

-->rand(‘seed’) ans = 8.096E+08

To find out which type of random number generator is active in function rand (i.e., normal oruniform) use:

-->rand(‘info’) ans = normal


To change the function rand back to uniform use:

-->rand(‘uniform’)

To change the seed to the number 15, for example, use:

-->rand(‘seed’,15)

The first 10 random numbers generated by rand after seeding it with a value of 15 are: -->rand(1,10) ans =

column 1 to 5 ! .1018111 .5348560 .9628528 .1235873 .6667947 ! column 6 to 10 ! .4106913 .6578733 .6756193 .1201851 .0268646 !

After generating those 10 random numbers the value of seed has changed to:

-->rand(‘seed’) ans = 57691269.

If, for some reason, you need to re-start the previous sequence of random numbers, you cansimply re-seed function rand with the value of 15:

-->rand(‘seed’,15)

Check that you get the same sequence of random numbers by comparing the following 5random numbers with the first 5 random numbers generated earlier after using seed = 15:

-->rand(1,5) ans = ! .1018111 .5348560 .9628528 .1235873 .6667947 !

SCILAB function for generating synthetic dataSCILAB provides function grand (generating random numbers) to generate a vector or matrixwith data that follows, among others, the following distributions: binomial, Poisson, gamma,beta, exponential, uniform integer, uniform real, normal, chi-squared, and Student’s t. Twogeneral calls to the function are:

[x] = grand(m,n,dist_type,dist_parameters)[x] = grand(A,dist_type,dist_parameters)

where dist_type is a string identifying the type of distribution, and dist_parameters is a list ofthe parameters defining the distribution. In the first form of the call the values m and nrepresent the number of rows and columns of a matrix to be generated containing randomnumbers that follow the desired distribution. In the second form of the function call anexisting matrix A is provided so that the function generates a new matrix with the samedimensions as A containing the random numbers that follow the desired distribution.


The following strings identify the type of distribution requested. We also identify theparameters required for each distribution:

String Distribution Parameters‘bin’ Binomial N, P‘poi’ Poisson λ‘bet’ Beta α, β‘gam’ Gamma α = shape, β = scale‘exp’ exponential µ=1/β‘nor’ normal µ, σ‘chi’ chi-square ν‘f’ F νN, νD

‘uin’ uniform integer a, b

‘unf’ uniform real a, b

The specific function calls for each probability distribution are shown next:

Binomial: x=grand(m,n,’bin’,N,P), x=grand(A,’bin’,N,P)

Poisson: x=grand(m,n,’poi’,mu), x=grand(x,’poi’,λ)

Beta: x=grand(m,n,’bet’,α,β), x=grand(A,’bet’, α,β)

Gamma: x=grand(m,n,’gam’, α,β), x=grand(A,’gam’, α,β)

Exponential: x=grand(m,n,’exp’,µ), x=grand(A,’exp’,µ)

Normal: x=grand(m,n,’nor’,µ, σ), x=grand(A,’nor’, µ, σ)

Chi-square: x=grand(m,n,’chi’,ν), x=grand(A,’chi’, ν)

F-distribution: x=grand(m,n,’f’, νN, νD), x=grand(A,’f’, νN, νD)

Uniform integer: x=grand(m,n,’uin’, α,β), x=grand(x,’uin’, a, b)

Uniform real: x=grand(m,n,’unf’, α,β),x=grand(x,’unf’, a, b)

Examples of synthetic data generation using function grandThe following examples demonstrate how to use function grand to generate sets of 200 datapoints that follow specific probability distributions. After the data are generated wedetermine their maximum and minimum values, select class boundaries for histograms of thedata, and use functions histnorm and normplot to check how close the data are to normality.We start the exercises by loading these two functions:

-->getf(‘histnorm’);getf(‘normplot’);

Binomial data

-->x=grand(1,200,’bin’,20,0.35);xmin=min(x),xmax=max(x) xmin = 2.


xmax = 14. -->xclass=[2:2:14];xset(‘window’,1);histnorm(x,xclass);

-->xset(‘window’,2);normplot(x);

Poisson data

-->x=grand(1,200,’poi’,12.5);xmin=min(x),xmax=max(x) xmin = 4. xmax = 23. -->xclass=[4:2:24];xset(‘window’,1);histnorm(x,xclass);



Beta data -->x=grand(1,200,’bet’,2,3);xmin=min(x),xmax=max(x) xmin = .0480813 xmax = .9132797 -->xclass=[0:0.1:1];xset(‘window’,1);histnorm(x,xclass);



Gamma data

-->x=grand(1,200,’gam’,2,3);xmin=min(x),xmax=max(x) xmin = .0042184 xmax = 2.6455776 -->xclass=[0:0.4:2.8];xset(‘window’,1);histnorm(x,xclass);



Normal data -->x=grand(1,200,’nor’,2500,1250);xmin=min(x),xmax=max(x) xmin = 1294.6718 xmax = 6467.2541

-->xclass=[-1000:1000:7000];xset(‘window’,1);histnorm(x,xclass);


Chi-square data -->x=grand(1,200,’chi’,12);xmin=min(x),xmax=max(x) xmin = 3.8312405 xmax = 28.583772

-->xclass=[0:3:30];xset(‘window’,1);histnorm(x,xclass);



F distribution data

-->x=grand(1,200,’f’,10,5);xmin=min(x),xmax=max(x) xmin = .110966 xmax = 53.694396 -->xclass=[0:10:60];xset(‘window’,1);histnorm(x,xclass);



-->xclass=[0:2:12];histnorm(x,xclass);

-->xclass=[0:0.5:6];histnorm(x,xclass);

Uniform integer data -->x=grand(1,200,’uin’,-5,5);xmin=min(x),xmax=max(x) xmin = -5. xmax = 5.


-->xclass=[-5:1:5];xset(‘window’,1);histnorm(x,xclass);


Uniform real data -->x=grand(1,200,’unf’,-5,5);xmin=min(x),xmax=max(x) xmin = -4.9677424 xmax = 4.9660118 -->xclass=[-5:1:5];xset(‘window’,1);histnorm(x,xclass);



Additional notes on function grand

The previous examples were used to illustrate applications of function grand to the generationof data that follows the binomial, Poisson, gamma, beta, exponential, normal, chi-square, F-,uniform integer, and uniform real distributions. Function grand allows the user to obtain datathat follow other distributions that are not presented in this book, such as the negativebinomial distribution, the multinomial distribution, the non-central F distribution, and the non-central chi-square distribution. (To find information about these and other distributionsconsult a statistics and probability textbook such as Spanos, A., 1999, “Probability Theory andStatistical Inference - Econometric Modeling with Observational Data,” Cambridge UniversityPress, Cambridge, U.K.).

To obtain additional details on the use of function grand use:

-->help grand

Function grand has access to 32 different random number generators that constitute the basisupon which random numbers that follow a particular probability distribution are generated. Bydefault, functions rand and grand use generator number 1. To check out which is the currentactive random number generator use:

-->grand(‘getcgn’) ans = 1.

This result indicates that you are currently using SCILAB’s default random number generator.The random number generators provided by SCILAB for use with function grand require twoseed numbers. To see the current seed numbers you can use the statement:

-->seeds = grand(‘getsd’) seeds = 1.0E+08 * ! 20.45933 9.2172801 !

You can re-initialize those seed to the original seeds by using:


-->grand(‘initgn’,-1) ans = 1.

We can check the initial seeds after re-initialization by using:

-->seeds = grand(‘getsd’) seeds = 1.0E+08 * ! 12.345679 1.2345679 !

You can also re-seed the generator (i.e., provide new seeds) by using the following call tofunction grand:

-->grand(‘setall’,10,20) ans = setall

To check that the new seeds are active use:

-->seeds=grand(‘getsd’) seeds = ! 10. 20. !

To change the random number generator from generator number 1 to generator number 5, forexample, use:

-->grand(‘setcgn’,5) ans = 5.The following call to function grand can be used to verify that the change of generator hasbeen made:

-->grand(‘getcgn’) ans = 5.

To check the values of the seeds for the current generator use:

-->seeds=grand(‘getsd’) seeds = ! 3.795E+08 77757764. !

Pseudo-random generatorsThe random number generators used in SCILAB and other computer applications are known aspseudo-random generators because, after generating a sufficiently long sequence of numbers,the numbers start repeating. Therefore, they are not strictly random generators, but onlyquasi-random or pseudo-random.

The random number generator provided with SCILAB is able to produce 2.3×1018 numbersbefore repetition of numbers occurs. This collection of numbers is partitioned into 32 pseudo-random generators, each containing 220 =1,048,576 blocks of non-overlapping random numbers.Each block is 230 = 1,073,741,824 in length.


Given the size of the sequences of random numbers that can be generated with each ofSCILAB’s 32 pseudo-random number generators, we are confident that the numbers thusgenerated are random enough for most practical applications. Furthermore, use of the defaultgenerator should be enough for most applications unless you

Another application of function grand is in the generation of permutations of a column vector.For example, the following application produces 10 permutations of the vector M containingthe first five positive integers. The permutations are shown as columns of a matrix.

-->M = [1 2 3 4 5]’; -->grand(10,’prm’,M) ans = ! 1. 2. 4. 1. 4. 4. 5. 4. 1. 3. !! 3. 1. 2. 4. 2. 2. 1. 3. 4. 2. !! 2. 3. 5. 5. 5. 3. 2. 2. 2. 5. !! 5. 4. 3. 3. 3. 5. 3. 1. 3. 4. !! 4. 5. 1. 2. 1. 1. 4. 5. 5. 1. !

Generating log-normally-distributed dataTo generate log-normally distributed data we first generate a set of normally distributed dataand then apply the exponential function to that data set. For example, if X follows thelognormal distribution with µln(X)=1.2, σln(X)=0.5, we can use the following SCILAB commands togenerate a set of 200 data points. We apply functions histnorm and normplot to this data setto check how close the data are to normality.

-->y=grand(1,200,’nor’,1.2,0.5); //Generate normal data N(1.2,0.5)-->x=exp(y); //Generate log-normal data by using exp-->xmin=min(x),xmax=max(x) //Determine min and max values xmin = 1.1210567 xmax = 11.161347 -->xclass=[0:2:12];histnorm(x,xclass); //Histogram

-->normplot(x); //Normal probability plot


Generating data that follows the Weibull distributionSCILAB does not provide for a function to generate data that follows the Weibull distribution,however, using the uniformly-generated random numbers from function rand we can generatenumbers p between 0 and 1 that represent probabilities p = FX(x) = P(X<x). Next, we use thecumulative distribution function for the Weibull distribution, namely,

and solve for x given values of p, i.e.,

.)1ln( /1 β

α

−−= px

The following SCILAB commands are used to generate 200 data points that follow the Weibulldistribution with a =2, b = 3. We also use functions histnorm and normplot to check how closethese data are to normality.

-->getf(‘histnorm’);getf(‘normplot’) //Load functions-->p=rand(1,200); //Generate probabilities -->a=2; b=3; //parameters of Weibull distribution -->x = (-log(1-p)/a)^(1/b); //generate Weibull data -->xmin=min(x), xmax = max(x) //check data range xmin = .1230276 xmax = 1.3553315 -->xclass = [0:0.1:1.4]; //select classes for histogram -->histnorm(x,xclass); //plot histogram and normal curve

0,0,0),exp(1)( >>>⋅−−= βαα β xforxxF


-->normplot(x) //create normal probability plot

It is interesting to notice that this Weibull data is very close to normality.

Generating data that follows the Student’s t distributionFunction grand does not allow for the generation of data following the Student’s t distribution.However, SCILAB provides for function cdft which lets you obtain the inverse of the cumulativedistribution. Using an approach similar to that shown above for the Weibull distribution, wecan generate random probability values through function rand, and then use function cdft togenerate the data required.

The following example illustrates the procedure:

-->getf(‘histnorm’);getf(‘normplot’); //Load functions histnorm & normplot-->pp = rand(1,200); //Generate random probabilities-->x = []; //This line and the for … end-->for j =1:200 //construct calculate values of x--> x = [x cdft(“T”,6,pp(j),1-pp(j))];-->end;-->xmin=min(x), xmax=max(x) //Determine min & max values xmin = 6.9441809 xmax = 3.4425429

-->xclass=[-7:1:4];xset(‘window’,1);histnorm(x,xclass); //Histogram


-->xset(‘window’,2);normplot(x); //Normal probability plot

Generating data that follows a discrete distribution

Using function grand we were able to generate discrete data that follows the binomial,Poisson, and uniform integer distributions. In this section we present a general method for thegeneration of data given a discrete distribution in the form of a table. For example, thefollowing table shows the probability mass function, fx(x) = P(X=x), and cumulative distributionfunction, FX(x) = P(X<x), of a discrete random variable X:

Random numbersX fX(x) FX(x) From to

0.5 0.10 0.10 0.00 0.101.5 0.25 0.35 0.10 0.352.5 0.20 0.55 0.35 0.553.5 0.15 0.70 0.55 0.704.5 0.15 0.85 0.70 0.855.5 0.15 1.00 0.85 1.00

The last two columns of the table represent the range of probabilities corresponding to thecumulative distribution function for each value of X. The procedure for generating data


consists in obtaining a value of random probability p = P(X<x) from a uniform distribution, e.g.,using function rand, and then assigning a value of X according to the range of values of therandom numbers. Thus, if function rand produces the random number 0.25, we assign to x thecorresponding value X = 1.5.The following function, discrand, will generate a matrix of dimensions n×m random numbersgiven vectors of values of X and FX, representing the values of a discrete random variable andits corresponding cumulative distribution function.

function [x] = discrand(n,m,xx,FX)//A function to generate a matrix nxm//following a discrete probability distribution//represented by vectors xx and FX = P(X<xx)

nx = length(xx);pp = rand(n,m);x = zeros(n,m);FXX = [0.00 FX];for i = 1:n

for j = 1:mfor k = 1:nx

if pp(i,j)>FXX(k) & pp(i,j)<=FXX(k+1) thenx(i,j) = xx(k);

end;end;

end;end;//end function discrand

An application of the function to generate 200 data points that follow the probabilitydistribution shown in the table above is presented next. We first load function discrand, thenenter the values of X and FX(x), and generate a row vector of 200 points. Next, we loadfunctions histnorm and normplot to check how well the data follows a normal distribution.-->getf(‘discrand’) -->X = [0.5:1.0:5.5]; FX = [0.10,0.35,0.55,0.70,0.85,1.00]; -->x=discrand(1,200,X,FX);-->getf(‘histnorm’);getf(‘normplot’); -->xmin=min(x), xmax=max(x) xmin = .5 xmax = 5.5 -->xclass=[0.5:0.5:5.5]; -->histnorm(x,xclass) ans = 24.643214


-->normplot(x)

Statistical simulationMany physical or other type of systems are described by one or more mathematicalrelationships (e.g., algebraic, difference, or differential equations) of diverse degrees ofcomplexity. We will refer to the set of mathematical relationships that describe a physicalsystem as a model. A model typically depends of certain constant values known as theparameters of the model. In the simplest of cases, a model can be represented by a black boxinto which a set of input data is provided, and from which a set of output results is obtained.This is illustrated in the following figure:

If the model is such that for a given set of input data it always produces a predictable result, itis referred to as a deterministic model. An example of a deterministic model is the equation


that describes the electric current, I, through a resistor, R, when a voltage, V, is applied acrossthe terminals of the resistor. The equation is

I = V/R.

If we apply a constant voltage Vo to the resistor, we get back a constant electric current, I0 =Vo/R. If we instead apply a variable voltage V(t) = Vo⋅sin(ωt), we obtain an electric current,I(t) = (Vo/R)⋅sin(ωt). Thus, knowing the value of the resistance R and the input to the system,i.e., the voltage, V0 or V(t), we can always find the value of the electric current. We cannotget more deterministic than this example.

If the input to the model is of a random nature, or if there is a random component to themodel itself, the model is said to be probabilistic or stochastic. For example, the black-boxmodel described above can be used to describe a hydrological basin. The input data is theamount and duration of the precipitation falling on the basin on a certain period of time. (Agraphical representation of precipitation vs. time is referred to as a hyetograph). This inputis, by its own nature, random or stochastic. This means that we cannot know exactly theamount of precipitation that will occur, say, in the next 24 hours.

Although a hydrological basin is extremely more complicated than an electric resistor, themodel used to predict the runoff (output) to the system can be a simple relationship involvingone or two parameters. (A graphical representation of the runoff coming out of the basin as afunction of time is known as a hydrograph). If the input hyetograph is known, then the outputhydrograph can be obtained in a deterministic way. However, because we do not knowexactly the input hyetograph for a particular period of time, except in a statistical manner, themodel is indeed a stochastic one.

Through the keeping of historical records of precipitation in the basin we can get a good ideaof the stochastic nature of precipitation to use as input for our stochastic model. We can thengenerate synthetic data representing the precipitation and use it as input to the model. Thisapproach to modeling physical (or economical, or other type of) systems is known as a MonteCarlo method. (The name derives from the capital of the European principalty of Monaco, thecity of Monte Carlo, famous for its casinos, where the laws of probability are seen in actionnight and day.)

Monte Carlo methods find applicability in all types of models where there is a randomcomponent to the input or parameters of the model. Statistical modeling can be used tomodel, for example, economic responses from human populations, the distribution of soilpermeabilities in an aquifer, the distribution of animal or plant populations, traffic patterns inhighways or airports, weather phenomena, etc. A simple application of a Monte Carlo methodto simulate the patterns of traffic through a service station is shown below.

Simulating traffic through a service station

Suppose we want to simulate the traffic through a service station in which only one customercan be serviced at a time. We also assume that once a customer arrives to the service station,he or she will not leave until service is provided. This is a simplistic model, but it could beused to simulate a vehicle service station in a city or highway, a medical emergency room, ahighway service station for state or privately own trucks, a store, etc.

The first customer arrives at a certain arrival time, AT1 (Arrival Time). He or she is taken careof right away so that the starting time of service for customer 1, ST1 (Starting Time), coincideswith his or her arrival time, thus, ST1 = AT1. The waiting time for customer 1 is, therefore,zero, i.e., WT1 = 0. The number of customers awaiting service at this point is also zero, i.e.,


NW1 = 0. The time required to service this first customer is referred to as TS1 (Time ofService). The first customer leaves the service station at time ET1 = ST1 + TS1 (Ending Time).The second customer arrives at the service station at a time AT2. If AT2 < ET1 (i.e., the secondcustomer arrives before service for the first one has finished), the second customer must waituntil the first customer leaves, so that ST2 becomes ET1 (ST2 = ET1). In this case, we cancalculate a waiting time for the second customer equal to WT2 = ET1 - AT2. Also, the number ofcustomers waiting for service at this point is NW2 = 1. If, instead, the second customer arrivesat a time AT2 ≥ ET1, then ST2 = AT2, and WT2

= 0. In any event, the ending time for the secondcustomer is calculated as ET2 = ST2 + TS2.

We define the inter-arrival time between customers 1 and 2 as IAT1 = AT2 - AT1. In general, theinter-arrival time between customers i and i+1 is IATi = ATi+1 - ATi. The inter-arrival time (IATi)and the time of service (TSi) are considered random variables of discrete nature. Thus, IATi

and TSi constitute random input to the model.

Suppose that we want to simulate the operation of the service center for n customers, we firstgenerate n-1 values of inter-arrival time {IAT1, IAT2, …, IATn-1}, as well as n values of theservice time {TS1, TS2, …, TSn}. Then, we proceed to calculate the arrival times as

ATi+1 = ATi + IATi, i = 1, 2, …, n-1.

As indicated earlier, the starting and ending times for the first customer are ST1 = AT1, ET1 =ST1 + TS1. Also, the waiting time and number of customers waiting at this stage are both zero,i.e., WT1 = 0, and NW1 = 0. The starting time for customer 2 is obtained as follows:

If AT2 > ET1, then ST2 = AT2, WT2 = 0, NW2 = 0If AT2 < ET1, then ST2 = ET1, WT2 = ET1 - AT2, and NW2 = 1.

For the third customer, we need to check the arrival time, AT3, against the ending times ofboth the first and second customers so we can determine the starting time, the waiting time,and the number of customers waiting at that point. The following piece of pseudo-code canbe used to determine such values:

for j = 2:nNWj = 0WTj = 0for k = 1:j-1

if ATj < ETk thenNWj = NWj + 1WTj = ETk - ATj

STj = ETk

elseSTj = ATj

endendET(j) = ST(j)+TS(j)

End

An user-defined function to simulate traffic through a servicestation

The steps outlined above are put together in the following function, service:

function [MR] = service(IAT,TS)


//Simulation of traffic in a service station//Given n-1 values of inter-arrival time IAT//and n values of time of service TS.//Results://Arrival time = AT, Starting time = ST//Ending time = ET, Waiting time = WT//Number of waiting customers = NW//

n = length(TS);AT = zeros(1,n);ST = zeros(1,n);ET = zeros(1,n);NW = zeros(1,n);WT = zeros(1,n);

IATT = [IAT 0];ST(1) = AT(1);ET(1) = ST(1) + TS(1);

for j = 2:nAT(j) = AT(j-1) + IAT(j-1);

end;

for j = 2:nNW(j) = 0;WT(j) = 0;for k = 1:j-1

if AT(j) < ET(k) thenNW(j) = NW(j) + 1;WT(j) = ET(k) - AT(j);ST(j) = ET(k);

elseST(j) = AT(j);

end;end;ET(j) = ST(j)+TS(j);

end;

disp(' ');printf('===============================================================\n');printf(' j AT IAT ST TS ET WT NW \n');printf('===============================================================\n');

for j = 1:nprintf('%3.0f %8.2f %8.2f %8.2f %8.2f %8.2f %8.2f %3.0f\n',... j,AT(j),IATT(j),ST(j),TS(j),ET(j),WT(j),NW(j));

end;printf('===============================================================\n');

MR = [AT' IATT' ST' TS' ET' WT' NW']; //Matrix of Resultsprintf('AT = arrival times IAT = inter-arrival times \n');printf('ST = starting times TS = time of service \n');printf('ET = ending times WT = waiting times \n');printf('NW = number of customers waiting \n');disp(' AT IAT ST TS ET WT NW');

//end function service

As an example, suppose that we have the following inter-arrival times (IAT) and times ofservice (TS):


-->IAT = [ 0.5 0.75 0.5 0.25 0.5];

-->TS = [ 1 2 1 1 2 1];

We can load function service and run it with the values of IAT and TS defined earlier to obtainthe following results:

-->Matrix_of_results = service(IAT,TS)

=============================================================== j AT IAT ST TS ET WT NW=============================================================== 1 0.00 .50 0.00 1.00 1.00 0.00 0 2 .50 .75 1.00 2.00 3.00 .50 1 3 1.25 .50 3.00 1.00 4.00 1.75 1 4 1.75 .25 4.00 1.00 5.00 2.25 2 5 2.00 .50 5.00 2.00 7.00 3.00 3 6 2.50 0.00 7.00 1.00 8.00 4.50 4===============================================================AT = arrival times IAT = inter-arrival timesST = starting times TS = time of serviceET = ending times WT = waiting timesNW = number of customers waiting

AT IAT ST TS ET WT NW Matrix_of_results =

! 0. .5 0. 1. 1. 0. 0. !! .5 .75 1. 2. 3. .5 1. !! 1.25 .5 3. 1. 4. 1.75 1. !! 1.75 .25 4. 1. 5. 2.25 2. !! 2. .5 5. 2. 7. 3. 3. !! 2.5 0. 7. 1. 8. 4.5 4. !

The function is designed to provide a table of results, as well as a matrix summarizing theresults in case that additional operations on those results are required within SCILAB. Thefunction, as applied in this case, is purely deterministic in the sense that for the given input weget a unique result. To work out a stochastic modeling of traffic through a service station weneed to provide random input. The following example shows how to obtain that random input.

Modeling traffic through a service station with random input

Suppose that the inter-arrival times and time of service for the service station model followsthe probability distributions shown in the following table:

x = IAT FX(x) x = TS FX(x)0.1 0.05 0.25 0.100.2 0.10 0.50 0.200.3 0.20 0.75 0.400.4 0.35 1.00 0.700.5 0.45 1.25 0.800.6 0.50 1.50 0.900.7 0.70 1.75 0.950.8 0.75 2.00 1.000.9 0.951.0 1.00


We want to analyze the traffic through the service station for 10 customers by generating 9inter-arrival times and 10 service times from these generations. The inter-arrival times andtimes of service can be generated using function discrand as follows:

-->getf('discrand')

-->xIAT = [0.1:0.1:1.0]; FIAT = [0.05,0.1,0.2,0.35,0.45,0.5,0.7,0.75,0.95,1.0];

-->xTS = [0.25:0.25:2]; FTS = [0.1,0.2,0.4,0.7,0.8,0.9,0.95,1];

-->IAT = discrand(1,9,xIAT,FIAT) //generate IAT data IAT =

! .4 .7 .7 .5 .4 .7 .5 .9 .1 !

-->TS = discrand(1,10,xTS,FTS) //generate TS data TS =

! 1. .75 1. .75 .5 1.25 .75 .5 1. .5 !

With these values of IAT and ST we now call function service:

-->M = service(IAT,TS)

=============================================================== j AT IAT ST TS ET WT NW=============================================================== 1 0.00 .40 0.00 1.00 1.00 0.00 0 2 .40 .70 1.00 .75 1.75 .60 1 3 1.10 .70 1.75 1.00 2.75 .65 1 4 1.80 .50 2.75 .75 3.50 .95 1 5 2.30 .40 3.50 .50 4.00 1.20 2 6 2.70 .70 4.00 1.25 5.25 1.30 3 7 3.40 .50 5.25 .75 6.00 1.85 3 8 3.90 .90 6.00 .50 6.50 2.10 3 9 4.80 .10 6.50 1.00 7.50 1.70 3 10 4.90 0.00 7.50 .50 8.00 2.60 4===============================================================AT = arrival times IAT = inter-arrival timesST = starting times TS = time of serviceET = ending times WT = waiting timesNW = number of customers waiting

AT IAT ST TS ET WT NW M =

! 0. .4 0. 1. 1. 0. 0. !! .4 .7 1. .75 1.75 .6 1. !! 1.1 .7 1.75 1. 2.75 .65 1. !! 1.8 .5 2.75 .75 3.5 .95 1. !! 2.3 .4 3.5 .5 4. 1.2 2. !! 2.7 .7 4. 1.25 5.25 1.3 3. !! 3.4 .5 5.25 .75 6. 1.85 3. !! 3.9 .9 6. .5 6.5 2.1 3. !! 4.8 .1 6.5 1. 7.5 1.7 3. !! 4.9 0. 7.5 .5 8. 2.6 4. !

Out of the matrix of results, M, we can extract individual columns of data, for example, thewaiting time data corresponds to the sixth column of M:


-->WT = M(:,6) WT =

! 0. !! .6 !! .65 !! .95 !! 1.2 !! 1.3 !! 1.85 !! 2.1 !! 1.7 !! 2.6 !

The number of waiting customers is extracted from the seventh column of matrix M:

-->NW = M(:,7) NW =

! 0. !! 1. !! 1. !! 1. !! 2. !! 3. !! 3. !! 3. !! 3. !! 4. !

The columns of data extracted from the matrix of results, M, can be used to obtain statisticssuch as the mean and standard deviation:

-->WT_mean = mean(WT), WT_sdev = st_deviation(WT) WT_mean = 1.295 WT_sdev = .7836701

-->NW_mean = mean(NW), NW_sdev = st_deviation(NW) NW_mean = 2.1 NW_sdev = 1.2866839

We can also function normplot to check how close the data is to normality:

-->getf('normplot')-->normplot(NW')


-->normplot(WT')

STIXBOX: a rudimentary statistics toolbox

STIXBOX (an abbreviation of statistical toolbox) is a collection of functions that performselected statistical and probability calculations. STIXBOX is available for download from theSCILAB main web page (http://www-rocq.inria.fr/SCILAB/). Instructions for its installation areprovided with the downloaded functions. The package includes a set of help manual pagesthat briefly describe the operation of the functions. Once loaded, the manual pages areavailable through the main SCILAB Help window.

Probability mass and probability density functions

Probability mass functions or pmf (for discrete random variables) and probability densityfunctions of pdf (for continuous random variables) start with the letter d, e.g., dbeta, dbinom,etc. Mass distribution functions are referred to by pX(k) = P[X=k], and probability density

functions by fX(x). Thus, if X ~ Binomial(n,p) with n = 10, p = 0.5, P[X=2] = pX(2) =

dbinom(2,10,0.5). And, if X ~ Normal(µ,σ2) with µ = 1.5, σ = 0.2, then fX(1.75) =

dnorm(1.75,1.5,0.2). The following probability mass and density functions are defined:

dbeta the beta density functiondbinom the binomial probability functiondchisq the chisquare density functiondf The F density function [modified by the author, 2/1/2001]dgamma the gamma density functiondhypgeo the hypergeometric probability functiondnorm the normal density function [modified by the author, 2/1/2001]dt the student t density function

Cumulative distribution functions

Cumulative distribution functions (cdf) are referred to as distribution functions if dealing withcontinuous variable, or as cumulative probability function if dealing with discrete variables.All cdfs in this package start with a p: pbeta, pbinom, etc. Both, discrete and continuous cdfsare referred to by FX(x) = P[X≤x]. Thus, if X ~ Binomial(n,p) with n = 10, p = 0.5, P[X≤2] =

http://www-rocq.inria.fr/scilab/


FX(2) = pbinom(2,10,0.5). And, if X ~ Normal(µ,σ2) with µ = 1.5, σ = 0.2, then FX(1.75) =

pnorm(1.75,1.5,0.2). The following cumulative distribution functions are defined:

pbeta the beta distribution functionpbinom the binomial cumulative probability functionpchisq the chisquare distribution functionpf The F distribution functionpgamma the gamma distribution functionphypge the hypergeometric cumulative probability functionpnorm the normal distribution functionpt the student t cdf (modified by the author, 2/1/2001)

Inverse cumulative distribution functions

Inverse cumulative distribution functions start with q: qbeta, qbinom, etc. . If FX(q) = P[X≤q] =

p, then q = FX-1(p). The value q is also referred to as a quantile of the distribution. The

following inverse cumulative distribution functions are defined:

qbeta the beta inverse distribution functionqbinom the binomial inverse cdfqchisq the chisquare inverse distribution functionqf The F inverse distribution functionqgamma the gamma inverse distribution functionqhypg the hypergeometric inverse cdfqnorm the normal inverse distribution functionqt the student t inverse distribution functionquantile empirical quantile (percentile).

Generating synthetic data

The generation of synthetic data that follows a particular distribution can be accomplishedwith the following random number generators. The name of the random generator functionsbegins with r: rbeta, rbinom, etc. Maple already provides function rand that producesuniformly distributed random numbers (use help rand for more information). The functionsprovided by STIXBOX generates random numbers that follow the distributions suggested by thenames of the functions. Thus, if you want to generate n = 10 data values x that follow thenormal distribution, with µ = 0.5, and σlnX = 0.1, use rnorm(10,0.5,0.1).

rbeta random numbers from the beta distributionrbinom random numbers from the binomial distributionrchisq random numbers from the chisquare distributionrexpweib random numbers from the exponential or weibull distributionsrf random numbers from the F distributionrgamma random numbers from the gamma distributionrgeom random numbers from the geometric distributionrhypg random numbers from the hypergeometric distributionrjbinom random numbers from the binomial distribution (reject method)rjgamma generates gamma random deviates (reject method)rjpoiss random numbers from the poisson distribution (reject method)rnorm normal random numbersrjpoiss random numbers from the poisson distribution (renewal method)


rt random numbers from the student t distribution

Logistic regression

These functions involve the logistic population growth model (see, for example, Example 8.3,page 504, in Kottegoda, N.T. and R. Rosso, 1997, Probability, Statistics, and Reliability forCivil and Environmental Engineers, The McGraw-Hill Companies, Inc., New York).

lodds log odds function.loddsinv compute the inverse of log odds.logitfit fit a logistic regression model.

Statistical graphics

Functions to produce a variety of statistical graphics. A normal probability paper plot isobtained by using qqnorm. Probability paper plots are also referred to as Q-Q plots. For thatreason the corresponding function names start with qq, e.g., qqgamma, qqnorm, etc. Also ofinterest are functions histo, plotsym.

histo plot a histogramidentify identify points on a plot by clicking with the mouse.pairs pairwise scatter plots (does not work)plotdens draw a nonparametric density estimate.plotsym plot with symbolsqqnorm normal probability paperqqplot plot empirical quantile vs empirical quantile

Binomial coefficients

bincoef calculates binomial coefficients: (n r) = n!/(r!(n-r)!),

Resampling methods

These methods apply to the process of resampling by which an attempt is made to remove anyexisting bias in the sample. For a quick introduction to jackknife (named so because thejackknife, like this method, is an useful tool) and the bootstrap (named so from the expression"lifting oneself by one's bootstraps"), see pp. 116-117 in Kottegoda, N.T. and R. Rosso, 1997,Probability, Statistics, and Reliability for Civil and Environmental Engineers, The McGraw-HillCompanies, Inc., New York.

covboot bootstrap estimate of the variance of a parameter estimate.covjack Jackknife estimate of the variance of a parameter estimate.stdboot bootstrap estimate of the parameter standard deviation.stdjack Jackknife estimate of the standard deviation of a parameter.rboot simulate a bootstrap resample from a sample.ciboot various bootstrap confidence interval.test1b bootstrap t test and confidence interval for the mean.


Tests, confidence intervals, and model estimation

These are functions related to statistical inference. Of interest for this class are the functionslsfit, testln, and test2r. Use the help function to obtain additional information on thefunctions.

cmpmod compare small linear model versus large oneciquant nonparametric confidence interval for quantilekstwo Kolmogorov-Smirnov statistic from two samples (needs function pks)linreg linear or polynomial regressionlsfit fit a multiple regression model.lsselect select a predictor subset for regressiontest1n tests and confidence intervals based on a normal sampletest1r test for median equals 0 using rank testtest2n tests and confidence intervals based on two normal samplestest2r test for equal location of two samples using rank test

Stixbox demonstrations

These are SCILAB functions that demonstrate some of the functions contained in STIXBOX

stixdemo demonstrate various stixbox routines.stixtest a second demo for stixbox

Famous datasets

Function getdata is used to load well-known datasets into the SCILAB environment. The datasets included are:

1 Phosphorus Data2 Scottish Hill Race Data3 Salary Survey Data4 Health Club Data5 Brain and Body Weight Data6 Cement Data7 Colon Cancer Data8 Growth Data9 Consumption Function10 Cost-of-Living Data11 Demographic Data

To activate function getdata and load data into variable x use:

--> x = getdata()

This function produces a dialog box displaying the list of data sets. The user can type in thenumber of the data set and get back some information about the data set before the set isloaded. The dialog box produced by getdata() is shown below.


The dialog box shows that we have selected data set number 5. Pressing [OK] will load thedata as well as provide information as shown below.

Examples on probability distributions using STIXBOX

!Plot of the standard normal distribution:-->z=-4:0.1:4;phi=dnorm(z,0,1);plot(z,phi,'z','phi(z)','standard normal')


!Plot of the Student-t distribution for ν = 2, 5, 10, 15, 20-->t=-4.0:0.1:4;nu=[2,5,10,15,20];-->for k=1:5,f=dt(t,nu(k));plot2d(t,f,k,'011',' ',[-4 0 4 0.4]), end-->xtitle('Student t distribution','t','f(t)')

!Plot of the chi-square distribution for nu=5-->x=0:0.1:20;nu=5;f=dchisq(x,nu);-->plot(x,f,'x','f(x)','Chi-square distribution, nu=5')

!Plot the F distribution for nu1=5 and nu2=10:-->x=0:0.1:5;nu1=5;nu2=10;f=df(x,nu1,nu2);-->plot(x,f,'F','f(F)','F distribution, nu1=5, nu=10')


!Determining zα, such that P(Z>zα) > α, or P(Z<zα) > 1- α. Also, zα/2 is such that P(Z>zα/2) >α/2, or P(Z<zα/2) > 1- α/2:

-->alpha = 0.05; z_alpha=qnorm(1-alpha), z_alpha2=qnorm(1-alpha/2) z_alpha = 1.6448536 z_alpha2 = 1.959964

!Determining tν,α, such that P(T>tα) > α, or P(T<tα) > 1- α. Also tν,α/2 is such that P(T>tα/2) >α/2, or P(T<tα/2) > 1- α/2:

-->nu=10;alpha=0.01;t_alpha=qt(1-alpha,nu),t_alpha2=qt(1-alpha/2,nu) t_alpha = 2.7637695 t_alpha2 = 3.1692727

!Determining χ2ν,α, such that P(X2>χ2

α) > α, or P(X2>χ2α) > 1- α. Similar definitions are used

to calculate the values χ2ν,1−α, χ2

ν,α/2, χ2ν,1−α/2:

-->nu=6;alpha=0.10;X_alpha=qchisq(1-alpha,nu)X_alpha = 10.644641

-->X_alpha2=qchisq(1-alpha/2,nu)X_alpha2 = 12.591587

-->nu=6;alpha=0.10;X_alpha=qchisq(alpha,nu)X_alpha = 2.2041307

-->X_alpha2=qchisq(alpha/2,nu)X_alpha2 = 1.6353829

!Generating 20 data points that follow the Weibull distribution, and producing a normalprobability plot for such data:

-->x = rexpweib(20,3,5); qqnorm(x,'o')


!Generating 200 data points that follow the binomial distribution. A histogram of the data isthen produced.

-->x = rbinom(200,10,0.35); histo(x);

Other options for function histo( ),using 8 suggested classes (or bins). Parameter odd = 0. Thefunction histo( )chooses 6 classes: -->histo(x,8,0)


In the next call, we suggest 15 classes, and the odd parameter takes a value odd = 1: -->histo(x,15,1)

The next call scales area in the histogram bars so that the total area is equal to 1:-->histo(x,8,0,1)


Exercises

[1]. The probability of a flood occurring in a particular section of a river in a given month isestimated, form existing records, to be 0.15. (a) What is the probability that there will bethree months of flood in the next year. (b) What is the probability that there will be less than6 months of flood in the next year.

[2]. Data kept at an airport shows an average of five cars per minute stopping to leave or pickup passengers in the terminal curb. (a) What is the probability that in the next minute therewill be 10 or more cars stopping at the curb? (b) What is the probability that there will be nocars at the curb in a given minute.

[3] It is known that 25 out of a batch of 200 concrete cylinders were prepared using a defectivetype of cement. If a laboratory receives a sample of 15 of those cylinders, what is theprobability that the sample will contain 5 of the defective cylinders?

[4]. If a factory is known to produce 5% defective truck tires, what is the probability that in agiven assembly line the first defective tire is detected after 20 tires have come out of theassembly line? What is the probability that the first defective tire is detected after 10 tireshave come out of the assembly line?

[5]. The time required to finish the construction of a mile of a particular highway is known tohave a normal distribution with a mean value of 3.5 days and a standard deviation of 0.5 days.What is the probability that the next mile in the road will be completed between 3 and 5 days?What is the probability that the construction of the next mile of the road will take more than 7days?

[6]. Let X represent the intensity of an earthquake in a particular scale. If X is modeled usingthe exponential distribution with parameter β = 6.5, determine the probability that theintensity of the next earthquake will be 3.5 or less. Also, determine the probability that theintensity of the earthquake will be between 2.5 and 4.5.

[7]. The gamma distribution, with parameters α =1.2, and β = 0.5, is used to model the timeof failure (in hours) of an electronic component. Determine the probability that a particularcomponent will last 100 hours or more. Determine the probability that the component will lastless than 2 hours.

[8]. If the wind velocity in miles per hour near a harbor is assumed to follow a Weibulldistribution with parameters α = 2 and β = 3, determine the probability of the wind velocitybeing between 15 and 75 mph. Also, determine the probability of the wind velocity beinglarger than 10 mph.

[9]. For a large value of n, the Binomial distribution can be approximated by the normaldistribution with parameters µ = np, σ = np(1-p). Suppose that you receive a shipment of1000 resistors produced by a machine that is know to produce 0.5% defective resistors. What isthe probability that there will be more than 200 defective resistors in the shipment by using:(a) the normal distribution approximation to the Binomial distribution, and (b) the Poissondistribution to the Binomial distribution.

[10]. Plot the probability mass function, fX(x), and the cumulative distribution function, FX(x),for the following discrete distributions:(a) Binomial with n = 20, p = 0.25 (b) Binomial with n = 20, p = 0.50(c) Binomial with n = 20, p = 0.75 (d) Poisson with λ = 5.0, plot for x = 0,1,2…,10(e) Geometric with p = 0.25, for x = 1,2,…,10 (f) Geometric with p = 0.50, for x = 1,2,…,10


(g) Geometric with p = 0.75, for x = 1,2,…,10 (h) Hypergeometric with N=100,n=20,a=40(i) Hypergeometric with N=40, n = 10, a = 20 (j) Hypergeometric with N = 120,n = 80,a = 10

[11]. Let X be a discrete random variable that follows the binomial distribution withparameters n and p. Let P0 = P(X ≤ x). Calculate:(a) P0 given n = 20, p = 0.35, x = 5 (b) n given p = 0.25, x = 8, P0 = 0.80(c) p given n = 25, x = 20, P0 = 0.75 (d) x given n = 10, p = 0.80, P0 = 0.30

[12]. Plot the probability density function, fX(x), and the cumulative distribution function,FX(x), for the following continuous distributions:

(a) Gamma with α = 0.5, β = 1.5 (b) Gamma with α = 2, β = 3 (c) Beta with α = 0.5, β = 1.5 (d) Beta with α = 3, β = 2(e) Weibull with α = 0.5, β = 1.5 (f) Weibull with α = 2, β = 2(g) Uniform with a = 2, b = 6 (h) Uniform with a = -3, b = 3(i) Exponential with β = 12.5 (j) Exponential with β = 4.8(k) Normal with µ = 5, σ = 5 (l) Normal with µ = 150, σ = 25(m) Student t with ν = 4 (n) Student t with ν = 12(o) Chi-square with ν = 4 (p) Chi-square with ν = 12(q) F distribution with νN = 4, νD = 10 (r) F distribution with νN = 4, νD = 10

[13]. Let X be a continuous random variable that follows the Gamma probability distributionwith parameters α and β. Let P0 = P(X ≤ x). Calculate:

(a) P0 given α = 2, β = 3, x = 3.5 (b) α given P0 = 0.40, β = 1.5, x = 1.2(c) β given P0 = 0.60, α = 5, x = 10.5 (d) x given P0 = 0.20, α = 10.5, β = 0.3

[14]. Let X be a continuous random variable that follows the Beta probability distribution withparameters α and β. Let P0 = P(X ≤ x). Calculate:

(a) P0 given α = 2, β = 3.5, x = 0.35 (b) α given P0 = 0.40, β = 2.3, x = 0.76(c) β given P0 = 0.60, α = 2.5, x = 0.45 (d) x given P0 = 0.20, α = 10.5, β = 0.3

[15]. Let T be a continuous random variable that follows Student t distribution with ν degreesof freedom. Let P0 = P(T ≤ t). Calculate:

(a) P0 given ν = 10, t = 1.5(b) ν given P0 = 0.40, t = -0.8(c) t given P0 = 0.20, ν = 8

[16]. Let χ2 be a continuous random variable that follows the chi-square distribution with νdegrees of freedom. Let P0 = P(Χ2 ≤ χ2). Calculate:

(d) P0 given ν = 6, χ2 = 2.25(e) ν given P0 = 0.40, χ2 = -0.8(f) χ2 given P0 = 0.20, ν = 12

[17]. Let F be a continuous random variable that follows the F distribution with νN degrees offreedom in the numerator and νD degrees of freedom in the denominator. Let P0 = P(F≤ F).Calculate:

(a) P0 given νN = 4, νD = 10, F = 2.5 (b) νN given P0 = 0.40, νD = 15, F = 3.2(c) νD given P0 = 0.60, νN = 3, F = 0.45 (d) F given P0 = 0.20, νN = 8, νD = 12


[18]. The following data represent measurements of the diameter of a cylinder produced for aprecision mechanism:

232. 248. 242. 250. 239. 244. 265. 262. 259. 236.246. 308. 221. 275. 261. 217. 260. 273. 228. 269.260. 247. 228. 274. 205. 254. 230. 252. 263. 255.244. 264. 243. 255. 261. 236. 226. 264. 260. 265.267. 243. 270. 275. 260. 281. 240. 257. 268. 231.

(a) Use function histnorm with a suitable number of classes to plot a histogram of the data aswell as the corresponding normal curve. (b) Use function normplot to produce a normalprobability plot of the data. (c) Based on these two plots, how well do the data follow thenormal distribution?

[19]. The following data set represents the time to failure, in years, of light bulbs.

1.39 1.07 3.22 3.67 .55 .81 1.22 1.26 .05 1.54 .97 1.01 .44 1.97 1.9 .89 3.25 .85 1.04 .43 1.33 .82 2.04 1.02 .53 .13 2.06 2.96 1.96 1.5 3.05 .42 1.17 1.72 2.68 .56 2.13 1.56 2.09 1.26 3.21 .74 3.04 2.74 .83 .79 1.56 1.55 .96 1.23


[20]. The following data set represents the yearly rainfall depth, in mm, recorded at a certainlocation:

126. 82.9 41.5 4.35 346. 102. 830. 12.8 366. 471. 408. 189. 646. 7.82 313. 17.4 165. 24.5 32.6 39.3 277. 13.7 52.3 171. 314. 60.6 29.1 468. 887. 44.5 135. 215. 106. 201. 51. 43. 335. 59.4 174. 870.


[21]. The following data set represents the number of vehicles stopping at a service station in agiven hour:

3. 5. 6. 4. 5. 9. 4. 4. 11. 4. 4. 8. 5. 4. 4. 6. 7. 4. 7. 8. 6. 9. 10. 7. 4. 3. 5. 9. 9. 11. 6. 5. 9. 12. 11. 5. 13. 8. 10. 6. 4. 5. 9. 8. 7. 5. 3. 6. 5. 5. 8. 3. 11. 4. 5. 9. 5. 1. 8. 6.


[22]. Generate data sets consisting of k values that follow the indicated distribution with theparameters listed below. Use functions histnorm and normplot to produce a histogram and a


normal probability plot of the data. How well do the data thus generated follow the normaldistribution based on the histogram and probability plot?

(a) Binomial, k = 200, n = 30, p = 0.7(b) Poisson, k = 300, λ = 14.5(c) Beta, k = 150, α =3.5, β = 5.2(d) Gamma, k = 100, α =3.5, β = 5.2(e) Exponential, k = 500, µ = 5.75(f) Normal, k=180, µ = 5.75, σ = 1.2(g) Chi-square, k = 230, ν = 5(h) F-distribution, k = 350, νN = 5, νD = 5(i) Uniform integer, k = 125, a = -50, b = 50(j) Uniform real, k = 200, a = 5.5, b = 17.5(k) Weibull, k = 200, α =7.2, β = 2.1(l) Student’s t, k = 150, ν = 12(m) Log-normal, k = 200, µln(X) = 1.2, σln(x) = 0.5

[23]. Generate data sets consisting of 250 values that follow the discrete distributiondescribed by the following probability mass function:

x 1.2 2.3 4.1 5.2 6.1 7.2 8.4 9.3 11.1fX(x) 0.04 0.08 0.12 0.16 0.08 0.04 0.20 0.24 0.04

Use functions histnorm and normplot to produce a histogram and a normal probability plot ofthe data. How well do the data thus generated follow the normal distribution based on thehistogram and probability plot?

[24]. Function service was developed to simulate the traffic through a service station. Usefunction service to produce a simulation of traffic through a service station that takes as input50 values of the inter-arrival time (IAT) and 50 values of the time of service (TS) generated outof the following cumulative distribution functions:

x=IAT FX(x) x=TS FX(x)0.2 0.03 0.4 0.050.4 0.14 0.8 0.150.6 0.08 1.2 0.350.8 0.12 1.6 0.251.0 0.23 2.0 0.151.2 0.10 2.4 0.051.4 0.051.6 0.051.8 0.102.0 0.10

Use functions histnorm and normplot to produce a histogram and a normal probability plot ofthe waiting time (WT) and number of customers waiting (NW). How well do the WT and NWdata follow the normal distribution?

[25]. One-dimensional random walk. Consider a particle that moves along a straight linesubject to a random motion. The particle starts at x1 = 0 and moves to position x2 = x1 + ∆x1,where ∆x1 is a random number. The next position of the particle is x3 = x2 + ∆x2, where ∆x2 is asecond random number. Subsequent positions of the particle are given by xk+1 = xk + ∆xk. The


random numbers used must include both positive and negative values so that the particle canmove forward and backward.

(a) Plot the position xk vs. k for a one-dimensional random walk that involves 300displacements ∆xk generated from a normal distribution with µ = 0 and σ = 1.

(b) Plot the position xk vs. k for a one-dimensional random walk that involves 300displacements ∆xk generated from a uniform distribution between -1 and 1.

[26]. Two-dimensional random walk. A two-dimensional random walk involves the displacementof a particle from a point (xk,yk) to a point (xk+1,yk+1) so that

xk+1 = xk + rk cos(θk), and xk+1 = xk + rk sin(θk),

where the values rk and θk are random numbers.

(a) Plot the two-dimensional random walk that results form 200 values of rk with a normaldistribution with mean µ = 1 and standard deviation σ = 0.2, and 200 values of θk

uniformly distributed between 0 and 2π.(b) Plot the two-dimensional random walk that results form 100 values of rk with a Weibull

distribution with parameters α = 2 and β = 3, and 200 values of θk uniformly distributedbetween 0 and 2π.

(c) Plot the two-dimensional random walk that results form 150 values of rk with a Gammadistribution with parameters α = 0.2 and β = 1.3, and 200 values of θk normallydistributed with mean µ = π and standard deviation σ = π/2.

(d) Plot the two-dimensional random walk that results form 250 values of rk with a Betadistribution with parameters α = 2 and β = 3, and 200 values of θk uniformly distributedbetween 0 and 2π.

[27]. The following table shows the annual maximum flow for the Ganga River in Indiameasured at specific station.

Year Q(m3/s) Year Q(m3/s) Year Q(m3/s) Year Q(m3/s)1885 7241 1907 7546 1929 4545 1951 44581886 9164 1908 11504 1930 5998 1952 39191887 7407 1909 8335 1931 3470 1953 54701888 6870 1910 15077 1932 6155 1954 59781889 9855 1911 6493 1933 5267 1955 46441890 11887 1912 8335 1934 6193 1956 63811891 8827 1913 3579 1935 5289 1957 45481892 7546 1914 9299 1936 3320 1958 40561893 8498 1915 7407 1937 3232 1959 44931894 16757 1916 4726 1938 3525 1960 38841895 9680 1917 8416 1939 2341 1961 48551896 14336 1918 4668 1940 2429 1962 57601897 8174 1919 6296 1941 3154 1963 91921898 8953 1920 8174 1942 6650 1964 30241899 7546 1921 9079 1943 4442 1965 2509


1900 6652 1922 7407 1944 4229 1966 47411901 11409 1923 5482 1945 5101 1967 59191902 9164 1924 19136 1946 4629 1968 37891903 7404 1925 9680 1947 4345 1969 45461904 8579 1926 3698 1948 4890 1970 38421905 9362 1927 7241 1949 3619 1971 45421906 7092 1928 3698 1950 5899

(a) Use function histnorm with a suitable number of classes to plot a histogram of the dataas well as the corresponding normal curve.

(b) Use function normplot to produce a normal probability plot of the data.

(c) Based on these two plots, how well do the data follow the normal distribution?

The following problems require that you load the functions from the Stixbox SCILABtoolbox.

[28]. Using function getdata() load data set number 1, described as:__________________________________________________________________________________

************************ Phosphorus Data **********************************Source: Snedecor, G. W. and Cochran, W. G. (1967),Statistical Methods, (6 Edition), Iowa State University, Ames, Iowa, p. 384.Taken From: Chatterjee and Hadi (1988), p. 82.Dimension: 18 observations on 3 variablesDescription: An investigation of the source from which corn plants obtain their phosphorus was carried out. Concentrations of phosphorus in parts per millions in each of 18 soils were measured.

Column Description 1 Concentrations of inorganic phosphorus in the soil 2 Concentrations of organic phosphorus in the soil 3 Phosphorus content of corn grown in the soil at 20 degrees C

__________________________________________________________________________________

(a) Separate the three columns of data into vectors x, y, and z, and use the user-definedfunction describe to obtain statistics of each of the columns of data.

(b) Use Stixbox function histo to obtain histograms of each of the columns of data.(c) Use Stixbox function qqnorm to obtain a normal probability plot of each of the data

columns.

[29].Using function getdata() load data set number 1, described as:

*********************** Scottish Hill Race Data *************************(...lines removed...)

Column Definition 1 Distance (miles) 2 Climb (ft) 3 Time (seconds)

__________________________________________________________________________________


(d) Separate the three columns of data into vectors x, y, and z, and use the user-definedfunction describe to obtain statistics of each of the columns of data.

(e) Use Stixbox function histo to obtain histograms of each of the columns of data.(f) Use Stixbox function qqnorm to obtain a normal probability plot of each of the data

columns.

REFERENCES (for all SCILAB documents at InfoClearinghouse.com)

Abramowitz, M. and I.A. Stegun (editors), 1965,"Handbook of Mathematical Functions with Formulas, Graphs, andMathematical Tables," Dover Publications, Inc., New York.

Arora, J.S., 1985, "Introduction to Optimum Design," Class notes, The University of Iowa, Iowa City, Iowa.

Asian Institute of Technology, 1969, "Hydraulic Laboratory Manual," AIT - Bangkok, Thailand.

Berge, P., Y. Pomeau, and C. Vidal, 1984,"Order within chaos - Towards a deterministic approach to turbulence," JohnWiley & Sons, New York.

Bras, R.L. and I. Rodriguez-Iturbe, 1985,"Random Functions and Hydrology," Addison-Wesley Publishing Company,Reading, Massachussetts.

Brogan, W.L., 1974,"Modern Control Theory," QPI series, Quantum Publisher Incorporated, New York.

Browne, M., 1999, "Schaum's Outline of Theory and Problems of Physics for Engineering and Science," Schaum'soutlines, McGraw-Hill, New York.

Farlow, Stanley J., 1982, "Partial Differential Equations for Scientists and Engineers," Dover Publications Inc., NewYork.

Friedman, B., 1956 (reissued 1990), "Principles and Techniques of Applied Mathematics," Dover Publications Inc., NewYork.

Gomez, C. (editor), 1999, “Engineering and Scientific Computing with Scilab,” Birkhäuser, Boston.

Gullberg, J., 1997, "Mathematics - From the Birth of Numbers," W. W. Norton & Company, New York.

Harman, T.L., J. Dabney, and N. Richert, 2000, "Advanced Engineering Mathematics with MATLAB® - Second edition,"Brooks/Cole - Thompson Learning, Australia.

Harris, J.W., and H. Stocker, 1998, "Handbook of Mathematics and Computational Science," Springer, New York.

Hsu, H.P., 1984, "Applied Fourier Analysis," Harcourt Brace Jovanovich College Outline Series, Harcourt BraceJovanovich, Publishers, San Diego.

Journel, A.G., 1989, "Fundamentals of Geostatistics in Five Lessons," Short Course Presented at the 28th InternationalGeological Congress, Washington, D.C., American Geophysical Union, Washington, D.C.

Julien, P.Y., 1998,”Erosion and Sedimentation,” Cambridge University Press, Cambridge CB2 2RU, U.K.

Keener, J.P., 1988, "Principles of Applied Mathematics - Transformation and Approximation," Addison-WesleyPublishing Company, Redwood City, California.

Kitanidis, P.K., 1997,”Introduction to Geostatistics - Applications in Hydogeology,” Cambridge University Press,Cambridge CB2 2RU, U.K.

Koch, G.S., Jr., and R. F. Link, 1971, "Statistical Analysis of Geological Data - Volumes I and II," Dover Publications,Inc., New York.

Korn, G.A. and T.M. Korn, 1968, "Mathematical Handbook for Scientists and Engineers," Dover Publications, Inc., NewYork.

Kottegoda, N. T., and R. Rosso, 1997, "Probability, Statistics, and Reliability for Civil and Environmental Engineers,"The Mc-Graw Hill Companies, Inc., New York.

Kreysig, E., 1983, "Advanced Engineering Mathematics - Fifth Edition," John Wiley & Sons, New York.


Lindfield, G. and J. Penny, 2000, "Numerical Methods Using Matlab®," Prentice Hall, Upper Saddle River, New Jersey.

Magrab, E.B., S. Azarm, B. Balachandran, J. Duncan, K. Herold, and G. Walsh, 2000, "An Engineer's Guide toMATLAB®", Prentice Hall, Upper Saddle River, N.J., U.S.A.

McCuen, R.H., 1989,”Hydrologic Analysis and Design - second edition,” Prentice Hall, Upper Saddle River, New Jersey.

Middleton, G.V., 2000, "Data Analysis in the Earth Sciences Using Matlab®," Prentice Hall, Upper Saddle River, NewJersey.

Montgomery, D.C., G.C. Runger, and N.F. Hubele, 1998, "Engineering Statistics," John Wiley & Sons, Inc.

Newland, D.E., 1993, "An Introduction to Random Vibrations, Spectral & Wavelet Analysis - Third Edition," LongmanScientific and Technical, New York.

Nicols, G., 1995, “Introduction to Nonlinear Science,” Cambridge University Press, Cambridge CB2 2RU, U.K.

Parker, T.S. and L.O. Chua, , "Practical Numerical Algorithms for Chaotic Systems,” 1989, Springer-Verlag, New York.

Peitgen, H-O. and D. Saupe (editors), 1988, "The Science of Fractal Images," Springer-Verlag, New York.

Peitgen, H-O., H. Jürgens, and D. Saupe, 1992, "Chaos and Fractals - New Frontiers of Science," Springer-Verlag, NewYork.

Press, W.H., B.P. Flannery, S.A. Teukolsky, and W.T. Vetterling, 1989, “Numerical Recipes - The Art of ScientificComputing (FORTRAN version),” Cambridge University Press, Cambridge CB2 2RU, U.K.

Raghunath, H.M., 1985, "Hydrology - Principles, Analysis and Design," Wiley Eastern Limited, New Delhi, India.

Recktenwald, G., 2000, "Numerical Methods with Matlab - Implementation and Application," Prentice Hall, UpperSaddle River, N.J., U.S.A.

Rothenberg, R.I., 1991, "Probability and Statistics," Harcourt Brace Jovanovich College Outline Series, Harcourt BraceJovanovich, Publishers, San Diego, CA.

Sagan, H., 1961,"Boundary and Eigenvalue Problems in Mathematical Physics," Dover Publications, Inc., New York.

Spanos, A., 1999,"Probability Theory and Statistical Inference - Econometric Modeling with Observational Data,"Cambridge University Press, Cambridge CB2 2RU, U.K.

Spiegel, M. R., 1971 (second printing, 1999), "Schaum's Outline of Theory and Problems of Advanced Mathematics forEngineers and Scientists," Schaum's Outline Series, McGraw-Hill, New York.

Tanis, E.A., 1987, "Statistics II - Estimation and Tests of Hypotheses," Harcourt Brace Jovanovich College OutlineSeries, Harcourt Brace Jovanovich, Publishers, Fort Worth, TX.

Tinker, M. and R. Lambourne, 2000, "Further Mathematics for the Physical Sciences," John Wiley & Sons, LTD.,Chichester, U.K.

Tolstov, G.P., 1962, "Fourier Series," (Translated from the Russian by R. A. Silverman), Dover Publications, New York.

Tveito, A. and R. Winther, 1998, "Introduction to Partial Differential Equations - A Computational Approach," Texts inApplied Mathematics 29, Springer, New York.

Urroz, G., 2000, "Science and Engineering Mathematics with the HP 49 G - Volumes I & II", www.greatunpublished.com,Charleston, S.C.

Urroz, G., 2001, "Applied Engineering Mathematics with Maple", www.greatunpublished.com, Charleston, S.C.

Winnick, J., , "Chemical Engineering Thermodynamics - An Introduction to Thermodynamics for UndergraduateEngineering Students," John Wiley & Sons, Inc., New York.

Documents

i nfoClearinghouse. - UNSomorr/radovan_omorjan_003_prII/s_examples/Scilab… · i nfoClearinghouse.com ©2001 Gilberto E. Urroz ... Generating data that follows the Weibull distribution