Quantitative Data Analysis: Hypothesis Testing

Preview:

DESCRIPTION

 

Citation preview

Chapter 12Quantitative Data Analysis: Hypothesis

Testing

Research Methodology

Types I errors, Type II Errors &statistical Power

Type I error

Type II error

: the probability of rejecting the null hypothesis when it

is actually true.

the probability of failing to reject the null hypothesis given that the alternative

hypothesis is actually true.

Statistical power (1 - ):

Sample sizeEffect size

alpha

the probability of correctly rejecting the null hypothesis.

Testing Hypotheses on a Single Mean

One sample t-test: statistical technique that is used to test the hypothesis that the mean of the population from which a sample is drawn is equal to a comparison standard.

Testing hypothesis about two related means

Paired sample t-test to examine the differences in the same group before and after treatment.

The Wilcoxon signed-rank test: a non-parametric test for examining significant differences between two related samples or repeated measurements on a single sample. Used as an alternative for a paired samples t-test when the population cannot be assumed to be normally distributed.

RESEARCH METHODOLOGY OF TEN STUDENTS IN THE FIRST WEEK AND LAST WEEK OF

SEMESTER

Testing hypothesis about two related means

McNemar's test: non-parametric method used on nominal data. It assesses the significance of the difference between two dependent samples when the variable of interest is dichotomous. It is used primarily in before-after studies to test for an experimental effect.

Performance of student before and after extra class

Testing hypothesis about two unrelated means

• Independent samples t-test: is done to see if there are any significant differences in the means for two groups in the variable of interest.

Testing hypothesis about several means

• Analysis Of Variance (ANOVA) helps to examine the significant mean differences among more than two groups on an interval or ratio-scaled dependent variable.

Regression Analysis

• Simple regression analysis is used in a situation where one metric independent variable is hypothesized to affect one metric dependent variable.

Scatter plot

30 40 50 60 70 80 90

PHYS_ATTR

20

40

60

80

100

LKLHD_DATE

Y

X

0̂0̂0̂0̂0̂0̂ `0?

iii XY 10

1̂1

Simple Linear Regression

Standardized regression coefficients

Standardized regression coefficients or beta coefficients are the estimates resulting from a multiple regression analysis performed on variable that have been standardized. This is usually done to allow the researcher to compare the relative effects of independent variable on the dependent variable, when independent variable are measured in different unit of measurement.

Regression with dummy variable

• A dummy variable (also known as an indicator variable, design variable, categorical variable, binary variable, or qualitative variable)

• Dummy variable allow to use nominal or ordinal variable as independent variable to explain, understand, or predict the dependent variable.

MULTICOLLINEARITY

• Encountered statistical phenomenon in which two or more independent variables in a multiple regression model are highly correlated.

• It makes the estimation of the regression coefficients impossible and sometimes unreliable.

• To detect multicollinearity, we must check the correlation matrix for the independent variables.

• The high correlations is first sign of sizeable multicollinearity.

TWO MEASURES :

Tolerance valueVariance inflation factor ( VIF )

To measure indicate the degree to which one independent variable and explained by the other independent variable.

A display of the FEV data in SPSS

• To fit multiple linear regression model in SPSS using the FEV data do the following:

• Analyze > Regression > Linear and then move forced expiratory volume into the dependent box and Smoke and age into independent(s) box. Then Click OK.

• This will give you the model summary table, ANOVA table and the regression coefficients table in the output window.

A demonstration of how to start fitting the multiple regression model in SPSS

A demonstration of how to select the dependent and independent variable(s) for fitting multiple regression in SPSS.

A demonstration of how to select diagnostic statistic for checking outliers and

multicollinearity issues in SPSS.

Multicollinearity is not a serious problem, because the estimation of the regression coefficients may be unstable.But when the objective of the study is to reliably estimate the individual regression coefficients, multicollinearity is a problem.

The Methods to Reduce

Reduce the set of independent variables to a set that are not collinear.Use more sophisticated ways to analyze the data, such as ridge regression.Create a new variable that is a composite of the highly correlated variables.

Testing moderating using regression analysis : interaction effects

It is effect one variable ( X1 ) on Y depends on the value of another variable ( X2 ).Moderating variable as a variable that modifies the original relationship between an independent variable and dependent variable. Example :H1 : The students’ judgement of the university’s library is affected by the students’ judgement of the computers.It’s means the relationship between the judgement of computers in the library and the judgement of the library is affected by computer ownership.H2 : The relationship between the judgement of computers in the library is moderated by computer ownership.

The relationship between the judgmentsof the library and judgment of computers in the library can be modeled as follows :

β0 + β1X1i + I

We have also hypothesized that the effect of X1 on Y depends on X2. This can be modeled as follows :

β1 = 1 X 2i

Adding the second equation into the first one leads to the following model :

Yi = β0 + +

Models (3) states that the slope of model (1) is a function of variable X2. Although this model allow us to test moderation, the following model is better :

Yi = β0 + X1i x X2i ) + +

Other multivariate tests and analysis

• Discriminant analysis• Logistic regression• Conjoint analysis• Two-way ANOVA• MANOVA• Canonical correlation

Other multivariate tests and analysis

• Discriminant analysis-help to identify IV that discriminate a normally scaled DV of interest.

Other multivariate tests and analysis

• Logistic regression-used when the DV is nonmetric-always used when DV has only 2 groups.-it allows researcher to predict discrete outcome.

Other multivariate tests and analysis

• Conjoint analysis-statistical technique used in many fields.-used to understand how consumers develop preferences for product/services-built on the idea that consumers evaluate the value of a product or service by combining the value that is provided by each attribute.

Other multivariate tests and analysis

• Two-way ANOVA-used to examine the effect of two non metric IV on a single metric DV-enable us to examine main effects & also interaction effects that exist between the independent variables.

Other multivariate tests and analysis

• Two-way ANOVA-exampleDV : Satisfy with toyIV : i) toy colour (pink & blue) ii) gender (male & female) Main effect of toy colour. Pink toys significantly more

satisfaction than the blue toys. Main effect of gender. The female are more satisfy with the

toy than the male

Other multivariate tests and analysis

• Multivariate Analysis of Variance (MANOVA)

-is a multivariate extension of analysis of variance. -the IV measured on a nominal scale & the DV on interval/ratio scale

i) The null hyphothesis:Hₒ:µ1=µ2=µ3... µn ii) The alternate hyphothesis:HA:µ1≠µ2≠µ3≠... µn

Other multivariate tests and analysis

• Canonical correlation-examine the relationship between two or more DV & several IV

Data warehousing

• Most companies are now aware of the benefits of creating a data warehouse that serves as the central repository of all data collected from disparate sources including those pertaining to the company's finance, manufacturing, sales, and the like.

Data Mining

• Complementary to the functions of data warehousing, many companies resort to data mining as a strategic tool for reaching new levels of business intelligence.

• Using algorithms to analyze data in a meaningful way, data mining more effectively leverages the data warehouse by identifying hidden relations and patterns in the data stored in it.

Operations Research

• Operations research (OR) or management science (MS) is another sophisticated tool used to simplify and thus clarify certain types of complex problem that lend themselves to quantification.

Recommended