Click here to load reader
Upload
dwight-lang
View
233
Download
7
Embed Size (px)
Citation preview
Multiple Regression AnalysisMultiple Regression Analysis
Multivariate Multivariate Analysis Analysis
Multiple Regression AnalysisMultiple Regression AnalysisMultivariate Multivariate AnalysisAnalysis
Statistical methods that allow the simultaneous Statistical methods that allow the simultaneous investigation of more than two variables.investigation of more than two variables.
The techniques can be used to summarize data and The techniques can be used to summarize data and
reduce the number of variables necessary to describe reduce the number of variables necessary to describe
the data.the data.
Several of the more common multivariate techniques: Several of the more common multivariate techniques:
Multiple Regression AnalysisMultiple Regression Analysis
Multiple Discriminate AnalysisMultiple Discriminate Analysis
Factor AnalysisFactor Analysis
Cluster AnalysisCluster Analysis
Multidimensional ScalingMultidimensional Scaling
Regression Regression AnalysisAnalysis
Regression Analysis is concerned with the Regression Analysis is concerned with the
dependence of one variable , the dependence of one variable , the
dependent variabledependent variable, on one or more of , on one or more of
other variables, the other variables, the explanatory variablesexplanatory variables, ,
with a view to estimating and/or predicting with a view to estimating and/or predicting
the mean value of the former in terms of the mean value of the former in terms of
known or fixedknown or fixed values of the later. values of the later.
Multiple Regression AnalysisMultiple Regression Analysis
To determine the association or relationship To determine the association or relationship between between dependent and independentdependent and independent variables. variables.
In multiple regression analysis, In multiple regression analysis, two or more two or more independent variablesindependent variables are included in are included in examinations. examinations.
The general form of the multiple regression The general form of the multiple regression model is as follows:model is as follows:
innXXXY ...22110
0n ..., 21
Multiple Regression Multiple Regression AnalysisAnalysis
WhereWhere
= Y intercept of regression model= Y intercept of regression model
= Slope of the regression model= Slope of the regression model
εεii = Random error = Random error
Multiple Regression AnalysisMultiple Regression Analysis
cYiX
Partial Regression CoefficientPartial Regression Coefficient
Denotes the change in the computed value, , Denotes the change in the computed value, ,
per one unit change in when all other per one unit change in when all other
independent variables are held constant.independent variables are held constant.
Multiple Regression AnalysisMultiple Regression Analysis
The % of the variance in the dependent The % of the variance in the dependent
variable that is explained by the variation in variable that is explained by the variation in
the independent variables.the independent variables.
where where TSS = total sum of squares TSS = total sum of squares = =
RSS = regression sum of squares RSS = regression sum of squares = =
ESS = error sum of squares ESS = error sum of squares = =
2)( YY
2)( YYc
2)( cYY
TSS
ESS
TSS
RSSR 12
Coefficient of Multiple DeterminationCoefficient of Multiple Determination
Multiple Regression AnalysisMultiple Regression Analysis
Adjusted Coefficient of DeterminationAdjusted Coefficient of Determination The number of independent variables in a multiple The number of independent variables in a multiple
regression equation makes the coefficient of regression equation makes the coefficient of
determination larger. Each new independent variable determination larger. Each new independent variable
causes the predictions to be more accurate. causes the predictions to be more accurate.
To balance the effect that the number of independent To balance the effect that the number of independent
variables has on the coefficient of multiple variables has on the coefficient of multiple
determination, we use an determination, we use an adjusted adjusted coefficient of coefficient of
multiple determinationmultiple determination..
Multiple Regression AnalysisMultiple Regression Analysis
Salsberry Realty sells homes along the east coast of the United Salsberry Realty sells homes along the east coast of the United
States. One of the questions most frequently asked by States. One of the questions most frequently asked by
prospective buyers is: If we purchase this home, how much can prospective buyers is: If we purchase this home, how much can
we expect to pay to heat it during the winter? The research we expect to pay to heat it during the winter? The research
department at Salsberry has been asked to develop some department at Salsberry has been asked to develop some
guidelines regarding heating costs for single-family homes. guidelines regarding heating costs for single-family homes.
Three variables are thought to relate to the heating costs: Three variables are thought to relate to the heating costs: (1) the (1) the
mean daily outside temperature, (2) the number of inches of mean daily outside temperature, (2) the number of inches of
insulation in the attic, and (3) the age in years of the furnaceinsulation in the attic, and (3) the age in years of the furnace. .
To investigate, Salsberry’s research department selected a random To investigate, Salsberry’s research department selected a random
sample of 20 recently sold homes. It determined the cost to heat sample of 20 recently sold homes. It determined the cost to heat
each home last January, as welleach home last January, as well
Multiple Linear Regression - ExampleMultiple Linear Regression - Example
Multiple Regression AnalysisMultiple Regression Analysis
Multiple Regression AnalysisMultiple Regression Analysis
Testing the Multiple Regression Testing the Multiple Regression ModelModel
The test is used to investigate whether any of the The test is used to investigate whether any of the independent variables have significant independent variables have significant coefficients.coefficients.
The hypotheses are:The hypotheses are:
0 equal s allNot :
0...:
1
210
H
H k
Multiple Regression AnalysisMultiple Regression Analysis
The test statistic is the The test statistic is the FF distribution with distribution with kk (number of independent variables) and (number of independent variables) and n-(k+1)n-(k+1) degrees of freedom, where degrees of freedom, where nn is is the sample size. the sample size.
Decision Rule: Decision Rule:
Reject HReject H00 if F > F if F > F,,k,n-k-k,n-k-11
Testing the Multiple Regression Testing the Multiple Regression ModelModel
Multiple Regression AnalysisMultiple Regression Analysis
The ANOVA TableThe ANOVA TableThe The ANOVA (Analysis Of Variation) tableANOVA (Analysis Of Variation) table reports the variation in the reports the variation in the
dependent variable. The variation is divided into two components.dependent variable. The variation is divided into two components. The The Explained VariationExplained Variation is that accounted for by the set of is that accounted for by the set of
independent variable. independent variable. The The Unexplained or Random VariationUnexplained or Random Variation is not accounted for by the is not accounted for by the
independent variables.independent variables.
Multiple Regression AnalysisMultiple Regression Analysis
ANOVA TableANOVA Table
Multiple Regression AnalysisMultiple Regression AnalysisInterpretationInterpretation The computed value of The computed value of F F is 21.90, is 21.90,
which is in the rejection region. which is in the rejection region.
The null hypothesis that all the The null hypothesis that all the
multiple regression coefficients are multiple regression coefficients are
zero is therefore rejected. zero is therefore rejected.
Interpretation: some of the Interpretation: some of the
independent variables (amount of independent variables (amount of
insulation, etc.) do have the ability insulation, etc.) do have the ability
to explain the variation in the to explain the variation in the
dependent variable (heating cost).dependent variable (heating cost).
Logical question – which ones?Logical question – which ones?
Multiple Regression AnalysisMultiple Regression AnalysisEvaluating Individual Regression Coefficients (Evaluating Individual Regression Coefficients (ββii = 0) = 0)
This test is used to determine which independent This test is used to determine which independent variables have nonzero regression coefficients. variables have nonzero regression coefficients.
The variables that have zero regression The variables that have zero regression coefficients are usually dropped from the analysis.coefficients are usually dropped from the analysis.
The test statistic is the The test statistic is the tt distribution with distribution with n-n-((kk+1) +1) degrees of freedom.degrees of freedom.
The hypothesis test is as follows:The hypothesis test is as follows:
HH00: : ββii = 0 = 0
HH11: : ββii ≠ 0 ≠ 0
Reject HReject H00 if if t t > > tt/2,n-k-1 /2,n-k-1 or t or t < -< -tt/2,n-k-1/2,n-k-1
Multiple Regression AnalysisMultiple Regression AnalysisCritical t-stat for the SlopesCritical t-stat for the Slopes
Multiple Regression AnalysisMultiple Regression Analysis
The t-test shows that mean outside temperature The t-test shows that mean outside temperature
and attic insulation are significantly (p-value < and attic insulation are significantly (p-value <
0.05) and negatively associated with cost of 0.05) and negatively associated with cost of
heating but age of furnace has no significant (p-heating but age of furnace has no significant (p-
value > 0.05) relationship with cost of heating. value > 0.05) relationship with cost of heating.
Interpretation Interpretation
Multiple Regression AnalysisMultiple Regression Analysis
Qualitative Independent VariablesQualitative Independent Variables
Frequently we wish to use Frequently we wish to use nominal-scale nominal-scale
variables—such as gender, whether the home has variables—such as gender, whether the home has
a swimming pool, or whether the sports team was a swimming pool, or whether the sports team was
the home or the visiting team—in our analysis. the home or the visiting team—in our analysis.
These are called These are called qualitative variablesqualitative variables..
To use a qualitative variable in regression To use a qualitative variable in regression
analysis, we use a scheme of analysis, we use a scheme of dummy variablesdummy variables in in
which one of the two possible conditions is which one of the two possible conditions is
coded coded 00 and the other and the other 11..
Multiple Regression AnalysisMultiple Regression Analysis
Use of dummy variablesUse of dummy variables Dummy variables are used when a Dummy variables are used when a
nominal scale variable is to be included in nominal scale variable is to be included in the regressionthe regression
When there are two categories of the When there are two categories of the variable, then one dummy variable is variable, then one dummy variable is used.used.
When there are n categories, then n-1 When there are n categories, then n-1 dummy variables are used.dummy variables are used.
Qualitative Independent VariablesQualitative Independent Variables
Multiple Regression AnalysisMultiple Regression Analysis
Suppose in the Suppose in the
Salsberry Realty Salsberry Realty
example that the example that the
independent variable independent variable
“garage” is added. “garage” is added.
For those homes For those homes
without an attached without an attached
garage, garage, 0 is used0 is used; for ; for
homes with an homes with an
attached garage, a attached garage, a 1 1
is usedis used..