20

Click here to load reader

Multiple Regression Analysis Multivariate Analysis

Embed Size (px)

Citation preview

Page 1: Multiple Regression Analysis Multivariate Analysis

Multiple Regression AnalysisMultiple Regression Analysis

Multivariate Multivariate Analysis Analysis

Page 2: Multiple Regression Analysis Multivariate Analysis

Multiple Regression AnalysisMultiple Regression AnalysisMultivariate Multivariate AnalysisAnalysis

Statistical methods that allow the simultaneous Statistical methods that allow the simultaneous investigation of more than two variables.investigation of more than two variables.

The techniques can be used to summarize data and The techniques can be used to summarize data and

reduce the number of variables necessary to describe reduce the number of variables necessary to describe

the data.the data.

Several of the more common multivariate techniques: Several of the more common multivariate techniques:

Multiple Regression AnalysisMultiple Regression Analysis

Multiple Discriminate AnalysisMultiple Discriminate Analysis

Factor AnalysisFactor Analysis

Cluster AnalysisCluster Analysis

Multidimensional ScalingMultidimensional Scaling

Page 3: Multiple Regression Analysis Multivariate Analysis

Regression Regression AnalysisAnalysis

Regression Analysis is concerned with the Regression Analysis is concerned with the

dependence of one variable , the dependence of one variable , the

dependent variabledependent variable, on one or more of , on one or more of

other variables, the other variables, the explanatory variablesexplanatory variables, ,

with a view to estimating and/or predicting with a view to estimating and/or predicting

the mean value of the former in terms of the mean value of the former in terms of

known or fixedknown or fixed values of the later. values of the later.

Page 4: Multiple Regression Analysis Multivariate Analysis

Multiple Regression AnalysisMultiple Regression Analysis

To determine the association or relationship To determine the association or relationship between between dependent and independentdependent and independent variables. variables.

In multiple regression analysis, In multiple regression analysis, two or more two or more independent variablesindependent variables are included in are included in examinations. examinations.

The general form of the multiple regression The general form of the multiple regression model is as follows:model is as follows:

innXXXY ...22110

0n ..., 21

Multiple Regression Multiple Regression AnalysisAnalysis

WhereWhere

= Y intercept of regression model= Y intercept of regression model

= Slope of the regression model= Slope of the regression model

εεii = Random error = Random error

Page 5: Multiple Regression Analysis Multivariate Analysis

Multiple Regression AnalysisMultiple Regression Analysis

cYiX

Partial Regression CoefficientPartial Regression Coefficient

Denotes the change in the computed value, , Denotes the change in the computed value, ,

per one unit change in when all other per one unit change in when all other

independent variables are held constant.independent variables are held constant.

Page 6: Multiple Regression Analysis Multivariate Analysis

Multiple Regression AnalysisMultiple Regression Analysis

The % of the variance in the dependent The % of the variance in the dependent

variable that is explained by the variation in variable that is explained by the variation in

the independent variables.the independent variables.

where where TSS = total sum of squares TSS = total sum of squares = =

RSS = regression sum of squares RSS = regression sum of squares = =

ESS = error sum of squares ESS = error sum of squares = =

2)( YY

2)( YYc

2)( cYY

TSS

ESS

TSS

RSSR 12

Coefficient of Multiple DeterminationCoefficient of Multiple Determination

Page 7: Multiple Regression Analysis Multivariate Analysis

Multiple Regression AnalysisMultiple Regression Analysis

Adjusted Coefficient of DeterminationAdjusted Coefficient of Determination The number of independent variables in a multiple The number of independent variables in a multiple

regression equation makes the coefficient of regression equation makes the coefficient of

determination larger. Each new independent variable determination larger. Each new independent variable

causes the predictions to be more accurate. causes the predictions to be more accurate.

To balance the effect that the number of independent To balance the effect that the number of independent

variables has on the coefficient of multiple variables has on the coefficient of multiple

determination, we use an determination, we use an adjusted adjusted coefficient of coefficient of

multiple determinationmultiple determination..

Page 8: Multiple Regression Analysis Multivariate Analysis

Multiple Regression AnalysisMultiple Regression Analysis

Salsberry Realty sells homes along the east coast of the United Salsberry Realty sells homes along the east coast of the United

States. One of the questions most frequently asked by States. One of the questions most frequently asked by

prospective buyers is: If we purchase this home, how much can prospective buyers is: If we purchase this home, how much can

we expect to pay to heat it during the winter? The research we expect to pay to heat it during the winter? The research

department at Salsberry has been asked to develop some department at Salsberry has been asked to develop some

guidelines regarding heating costs for single-family homes. guidelines regarding heating costs for single-family homes.

Three variables are thought to relate to the heating costs: Three variables are thought to relate to the heating costs: (1) the (1) the

mean daily outside temperature, (2) the number of inches of mean daily outside temperature, (2) the number of inches of

insulation in the attic, and (3) the age in years of the furnaceinsulation in the attic, and (3) the age in years of the furnace. .

To investigate, Salsberry’s research department selected a random To investigate, Salsberry’s research department selected a random

sample of 20 recently sold homes. It determined the cost to heat sample of 20 recently sold homes. It determined the cost to heat

each home last January, as welleach home last January, as well

Multiple Linear Regression - ExampleMultiple Linear Regression - Example

Page 9: Multiple Regression Analysis Multivariate Analysis

Multiple Regression AnalysisMultiple Regression Analysis

Page 10: Multiple Regression Analysis Multivariate Analysis

Multiple Regression AnalysisMultiple Regression Analysis

Testing the Multiple Regression Testing the Multiple Regression ModelModel

The test is used to investigate whether any of the The test is used to investigate whether any of the independent variables have significant independent variables have significant coefficients.coefficients.

The hypotheses are:The hypotheses are:

0 equal s allNot :

0...:

1

210

H

H k

Page 11: Multiple Regression Analysis Multivariate Analysis

Multiple Regression AnalysisMultiple Regression Analysis

The test statistic is the The test statistic is the FF distribution with distribution with kk (number of independent variables) and (number of independent variables) and n-(k+1)n-(k+1) degrees of freedom, where degrees of freedom, where nn is is the sample size. the sample size.

Decision Rule: Decision Rule:

Reject HReject H00 if F > F if F > F,,k,n-k-k,n-k-11

Testing the Multiple Regression Testing the Multiple Regression ModelModel

Page 12: Multiple Regression Analysis Multivariate Analysis

Multiple Regression AnalysisMultiple Regression Analysis

The ANOVA TableThe ANOVA TableThe The ANOVA (Analysis Of Variation) tableANOVA (Analysis Of Variation) table reports the variation in the reports the variation in the

dependent variable. The variation is divided into two components.dependent variable. The variation is divided into two components. The The Explained VariationExplained Variation is that accounted for by the set of is that accounted for by the set of

independent variable. independent variable. The The Unexplained or Random VariationUnexplained or Random Variation is not accounted for by the is not accounted for by the

independent variables.independent variables.

Page 13: Multiple Regression Analysis Multivariate Analysis

Multiple Regression AnalysisMultiple Regression Analysis

ANOVA TableANOVA Table

Page 14: Multiple Regression Analysis Multivariate Analysis

Multiple Regression AnalysisMultiple Regression AnalysisInterpretationInterpretation The computed value of The computed value of F F is 21.90, is 21.90,

which is in the rejection region. which is in the rejection region.

The null hypothesis that all the The null hypothesis that all the

multiple regression coefficients are multiple regression coefficients are

zero is therefore rejected. zero is therefore rejected.

Interpretation: some of the Interpretation: some of the

independent variables (amount of independent variables (amount of

insulation, etc.) do have the ability insulation, etc.) do have the ability

to explain the variation in the to explain the variation in the

dependent variable (heating cost).dependent variable (heating cost).

Logical question – which ones?Logical question – which ones?

Page 15: Multiple Regression Analysis Multivariate Analysis

Multiple Regression AnalysisMultiple Regression AnalysisEvaluating Individual Regression Coefficients (Evaluating Individual Regression Coefficients (ββii = 0) = 0)

This test is used to determine which independent This test is used to determine which independent variables have nonzero regression coefficients. variables have nonzero regression coefficients.

The variables that have zero regression The variables that have zero regression coefficients are usually dropped from the analysis.coefficients are usually dropped from the analysis.

The test statistic is the The test statistic is the tt distribution with distribution with n-n-((kk+1) +1) degrees of freedom.degrees of freedom.

The hypothesis test is as follows:The hypothesis test is as follows:

HH00: : ββii = 0 = 0

HH11: : ββii ≠ 0 ≠ 0

Reject HReject H00 if if t t > > tt/2,n-k-1 /2,n-k-1 or t or t < -< -tt/2,n-k-1/2,n-k-1

Page 16: Multiple Regression Analysis Multivariate Analysis

Multiple Regression AnalysisMultiple Regression AnalysisCritical t-stat for the SlopesCritical t-stat for the Slopes

Page 17: Multiple Regression Analysis Multivariate Analysis

Multiple Regression AnalysisMultiple Regression Analysis

The t-test shows that mean outside temperature The t-test shows that mean outside temperature

and attic insulation are significantly (p-value < and attic insulation are significantly (p-value <

0.05) and negatively associated with cost of 0.05) and negatively associated with cost of

heating but age of furnace has no significant (p-heating but age of furnace has no significant (p-

value > 0.05) relationship with cost of heating. value > 0.05) relationship with cost of heating.

Interpretation Interpretation

Page 18: Multiple Regression Analysis Multivariate Analysis

Multiple Regression AnalysisMultiple Regression Analysis

Qualitative Independent VariablesQualitative Independent Variables

Frequently we wish to use Frequently we wish to use nominal-scale nominal-scale

variables—such as gender, whether the home has variables—such as gender, whether the home has

a swimming pool, or whether the sports team was a swimming pool, or whether the sports team was

the home or the visiting team—in our analysis. the home or the visiting team—in our analysis.

These are called These are called qualitative variablesqualitative variables..

To use a qualitative variable in regression To use a qualitative variable in regression

analysis, we use a scheme of analysis, we use a scheme of dummy variablesdummy variables in in

which one of the two possible conditions is which one of the two possible conditions is

coded coded 00 and the other and the other 11..

Page 19: Multiple Regression Analysis Multivariate Analysis

Multiple Regression AnalysisMultiple Regression Analysis

Use of dummy variablesUse of dummy variables Dummy variables are used when a Dummy variables are used when a

nominal scale variable is to be included in nominal scale variable is to be included in the regressionthe regression

When there are two categories of the When there are two categories of the variable, then one dummy variable is variable, then one dummy variable is used.used.

When there are n categories, then n-1 When there are n categories, then n-1 dummy variables are used.dummy variables are used.

Qualitative Independent VariablesQualitative Independent Variables

Page 20: Multiple Regression Analysis Multivariate Analysis

Multiple Regression AnalysisMultiple Regression Analysis

Suppose in the Suppose in the

Salsberry Realty Salsberry Realty

example that the example that the

independent variable independent variable

“garage” is added. “garage” is added.

For those homes For those homes

without an attached without an attached

garage, garage, 0 is used0 is used; for ; for

homes with an homes with an

attached garage, a attached garage, a 1 1

is usedis used..