16
Matlab: Statistics 1. Probability distributions 2. Hypothesis tests 3. Response surface modeling 4. Design of experiments

Matlab: Statistics 1.Probability distributions 2.Hypothesis tests 3.Response surface modeling 4.Design of experiments

Embed Size (px)

Citation preview

Page 1: Matlab: Statistics 1.Probability distributions 2.Hypothesis tests 3.Response surface modeling 4.Design of experiments

Matlab: Statistics

1. Probability distributions

2. Hypothesis tests

3. Response surface modeling

4. Design of experiments

Page 2: Matlab: Statistics 1.Probability distributions 2.Hypothesis tests 3.Response surface modeling 4.Design of experiments

Statistics Toolbox Capabilities

Descriptive statistics

Statistical visualization

Probability distributions

Hypothesis tests

Linear models

Nonlinear models

Multivariate statistics

Statistical process control

Design of experiments

Hidden Markov models

Page 3: Matlab: Statistics 1.Probability distributions 2.Hypothesis tests 3.Response surface modeling 4.Design of experiments

Probability Distributions

21 continuous distributions for data analysis » Includes normal distribution

6 continuous distributions for statistics» Includes chi-square and t distributions

8 discrete distributions» Includes binomial and Poisson distributions

Each distribution has functions for:» pdf — Probability density function» cdf — Cumulative distribution function» inv — Inverse cumulative distribution» functionsstat — Distribution statistics function» fit — Distribution fitting function» like — Negative log-likelihood function» rnd — Random number generator

Page 4: Matlab: Statistics 1.Probability distributions 2.Hypothesis tests 3.Response surface modeling 4.Design of experiments

Normal Distribution Functions

normpdf – probability distribution function normcdf – cumulative distribution function norminv – inverse cumulative distribution

function normstat – mean and variance normfit – parameter estimates and confidence

intervals for normally distributed data normlike – negative log-likelihood for

maximum likelihood estimation normrnd – random numbers from normal

distribution

Page 5: Matlab: Statistics 1.Probability distributions 2.Hypothesis tests 3.Response surface modeling 4.Design of experiments

Hypothesis Tests

17 hypothesis tests available chi2gof – chi-square goodness-of-fit test. Tests if a

sample comes from a specified distribution, against the alternative that it does not come from that distribution.

ttest – one-sample or paired-sample t-test. Tests if a sample comes from a normal distribution with unknown variance and a specified mean, against the alternative that it does not have that mean.

vartest – one-sample chi-square variance test. Tests if a sample comes from a normal distribution with specified variance, against the alternative that it comes from a normal distribution with a different variance.

Page 6: Matlab: Statistics 1.Probability distributions 2.Hypothesis tests 3.Response surface modeling 4.Design of experiments

Mean Hypothesis Test Example

>> h = ttest(data,m,alpha,tail) data: vector or matrix of data m: expected mean alpha: significance level Tail = ‘left’ (left handed alternative), ‘right’ (right

handed alternative) or ‘both’ (two-sided alternative) h = 1 (reject hypothesis) or 0 (accept hypothesis) Measurements of polymer molecular weight

Hypothesis: 0 = 1.3 instead of 1 < 0

>> h = ttest(x,1.3,0.1,'left')

h = 1

26.131.127.127.112.133.119.122.136.125.1

0049.0258.1 2 sx

Page 7: Matlab: Statistics 1.Probability distributions 2.Hypothesis tests 3.Response surface modeling 4.Design of experiments

Variance Hypothesis Test Example

>> h = vartest(data,v,alpha,tail) data: vector or matrix of data v: expected variance alpha: significance level Tail = ‘left’ (left handed alternative), ‘right’ (right

handed alternative) or ‘both’ (two-sided alternative)

h = 1 (reject hypothesis) or 0 (accept hypothesis) Hypothesis: 2 = 0.0049 and not a different

variance

>> h = vartest(x,0.0049,0.1,'both')

h = 0

Page 8: Matlab: Statistics 1.Probability distributions 2.Hypothesis tests 3.Response surface modeling 4.Design of experiments

Goodness of Fit

Perform hypothesis test to determine if data comes from a normal distribution

Usage: [h,p,stats]=chi2gof(x,’edges’,edges) » x: data vector» edges: data divided into intervals with the

specified edges» h = 1, reject hypothesis at 5% significance» h = 0, accept hypothesis at 5% significance» p: probability of observing the given statistic» stats: includes chi-square statistic and degrees of

freedom

Page 9: Matlab: Statistics 1.Probability distributions 2.Hypothesis tests 3.Response surface modeling 4.Design of experiments

Goodness of Fit Example

Find maximum likelihood estimates for µ and σ of a normal distribution

>> data=[320 … 360];

>> phat = mle(data)

phat = 364.7 26.7 Test if data comes from a normal distribution

>> [h,p,stats]=chi2gof(data,’edges’,[-inf,325:10:405,inf]);

>> h = 0

>> p = 0.8990

>> chi2stat = 2.8440

>> df = 7

Page 10: Matlab: Statistics 1.Probability distributions 2.Hypothesis tests 3.Response surface modeling 4.Design of experiments

Response Surface Modeling

Develop linear and quadratic regression models from data

Commonly termed response surface modeling Usage: rstool(x,y,model)

» x: vector or matrix of input values» y: vector or matrix of output values» model: ‘linear’ (constant and linear terms), ‘interaction’

(linear model plus interaction terms), ‘quadratic’ (interaction model plus quadratic terms), ‘pure quadratic’ (quadratic model minus interaction terms)

» Creates graphical user interface for model analysis

effects Quadratic

effectsn interactioBinary effectsMain Bias2333

2222

2111

322311321123322110

xxx

xxxxxxxxxy

Page 11: Matlab: Statistics 1.Probability distributions 2.Hypothesis tests 3.Response surface modeling 4.Design of experiments

Response Surface Model Example

VLE data – liquid composition held constant

>> x = [300 1; 275 1; 250 1; 300 0.75; 275 0.75; 250 0.75; 300 1.25; 275 1.25; 250 1.25];

>> y = [0.75; 0.77; 0.73; 0.81; 0.80; 0.76; 0.72; 0.74; 0.71];

Experiment 1 2 3 4 5 6 7 8 9

Temperature 300 275 250 300 275 250 300 275 250

Pressure 1.0 1.0 1.0 0.75 0.75 0.75 1.25 1.25 1.25

Vapor Composition

0.75 0.77 0.73 0.81 0.80 0.76 0.72 0.74 0.71

Page 12: Matlab: Statistics 1.Probability distributions 2.Hypothesis tests 3.Response surface modeling 4.Design of experiments

Response Surface Model Example cont.

>> rstool(x,y,'linear')

>> beta = 0.7411 (bias)

0.0005 (T)

-0.1333 (P)

>> rstool(x,y,'interaction')

>> beta2 = 0.3011 (bias)

0.0021 (T)

0.3067 (P)

-0.0016 (T*P)

>> rstool(x,y,'quadratic')

>> beta3 = -2.4044 (bias)

0.0227 (T)

0.0933 (P)

-0.0016 (T*P)

-0.0000 (T*T)

0.1067 (P*P)

Page 13: Matlab: Statistics 1.Probability distributions 2.Hypothesis tests 3.Response surface modeling 4.Design of experiments

Design of Experiments

Full factorial designs

Fractional factorial designs

Response surface designs» Central composite designs

» Box-Behnken designs

D-optimal designs – minimize the volume of the confidence ellipsoid of the regression estimates of the linear model parameters

Page 14: Matlab: Statistics 1.Probability distributions 2.Hypothesis tests 3.Response surface modeling 4.Design of experiments

Full Factorial Designs

>> d = fullfact(L1,…,Lk)

L1: number of levels for first factor

Lk: number of levels for last (kth) factor

d: design matrix

>> d = ff2n(k)

k: number of factors

d: design matrix for two levels

>> d = ff2n(3)

d =0 0 00 0 10 1 00 1 11 0 01 0 11 1 01 1 1

Page 15: Matlab: Statistics 1.Probability distributions 2.Hypothesis tests 3.Response surface modeling 4.Design of experiments

Fractional Factorial Designs

>> [d,conf] = fracfact(gen)

gen: generator string for the design

d: design matrix

conf: cell array that describes the confounding pattern

>> [x,conf] = fracfact('a b c abc')

x =-1 -1 -1 -1-1 -1 1 1-1 1 -1 1-1 1 1 -1 1 -1 -1 1 1 -1 1 -1 1 1 -1 -1 1 1 1 1

Page 16: Matlab: Statistics 1.Probability distributions 2.Hypothesis tests 3.Response surface modeling 4.Design of experiments

Fractional Factorial Designs cont.

>> gen = fracfactgen(model,K,res) model: string containing terms that must be estimable

in the design K: 2K total experiments in the design res: resolution of the design gen: generator string for use in fracfact

>> gen = fracfactgen('a b c d e f g',4,4)

gen = 'a''b''c''d''bcd''acd''abd'