Estimation and Hypothesis Testing Faculty of Information Technology King Mongkut’s University of Technology North Bangkok 1

Estimation and Hypothesis Testing Faculty of Information Technology King Mongkuts University of Technology North Bangkok 1

Inferential Statistics Inferential statistics is a body of quantitative techniques that enable the scientist to make appropriate generalization from limited observations. (Frank & Althoen, Statistics: Concepts and applications, 1994) 2

Content Estimation Hypothesis testing Forming hypothesis Testing population means Testing population variances Testing categorical data / proportion Hypothesis about many population means One-way ANOVA Two-way ANOVA 3

Estimation To infer a parameter of population from a statistic of sample From x (mean of sample) to (mean of population) From proportion of sample to p (proportion of population) From SD 2 (sample variance) to 2 (population variance) Point estimation Interval estimation 4

Point Estimation A point estimator is a single-valued statistic that approximates the value of a population parameter. Usually use the same (unbiased) value as statistics of sample Point estimation of Point estimation of proportion Point estimation of 2 5

Interval Estimation A confidence interval is a range of values that is expected to include the population parameter. Involve calculating a +-value from the statistics of sample Interval estimation of mean Interval estimation of proportion Interval estimation of variance 6

Hypothesis Testing Hypothesis testing a scientific method is used for making decision, conclusion or prove of the finding of research Steps Forming statistical hypothesis from research hypothesis Define statistical significant level () Usually 0.05 (5%), or 0.01 for research needing higher accuracy Select the appropriate statistic and calculate SPSS do the calculation Accept/reject hypothesis Make decision 7

Hypothesis The expected rational conclusion of statistical analysis Research Hypothesis Written in text Statistical Hypothesis Written in mathematical equation using parameter Hypothesis can be relational and comparative Hypothesis can be directional and non-directional 8

Research Hypothesis Examples Handwriting and examination score are related Relational, non-directional Handwriting and examination score are positively related Relational, directional Female students get higher final exam score than male student Comparative, directional The scores of female and male students are different Comparative, non-directional 9

Statistical Hypothesis Mathematical form H 0 : Null hypothesis (or Test hypothesis) Always non-directional Must have = (also >= and ,

Statistical Hypothesis Examples Female students get higher final exam score than male student H 0 : f m Female students get final exam score higher than or equal to (no less than) male student H 0 : ? H 1 : ? The scores of female and male students are different H 0 : f = m H 1 : f != m 11

Error in Hypothesis Testing Type I Error Error caused by rejecting H 0 when H 0 is true Probability of type I error is equal to which is statistical significant level defined in the analysis Type II Error Error caused by accepting H 0 when H 0 is false Probability of type I error is equal to 12

Error in Hypothesis Testing DecisionH 0 : true H 1 : false H 0 : false H 1 : true Accepting H 0 Correct decisionType II Error () Rejecting H 0 Type I Error () Correct decision 13

Hypothesis Testing Directional Test / One-Tailed Test Right-Tailed H 0 : k Left-Tailed H 0 : >= k H 1 : < k Non-directional Test / Two-Tailed Test H 0 : = k H 1 : != k : A population parameter 14

Testing Population Mean One Sample T-Test Independent Samples T-Test Paired Sample T-Test SPSS Analyze -> Compare Mean -> 15

Steps in Testing Mean Forming statistical hypothesis from research hypothesis Left-tailed, Right-tailed, Two-tailed Define statistical significant level () Usually 0.05 (5%), or 0.01 for research needing higher accuracy Calculate and compare T value to critical T value from T tableT table Right-tailed: Accept H 0, reject H 1 if T cal < T table Reject H 0, accept H 1 if T cal >= T table Left-tailed Accept H 0, reject H 1 if T cal > -T table Reject H 0, accept H 1 if T cal

One Sample T-Test Test mean of one sample against a test value Variable is either interval or ratio EX: Test if average total score is more than 55 H 0 : 55 If the hypothesis is true then we should reject H 0 and accept H 1 Calculate statistic (use SPSS) 17

SPSS Analysis Result SPSS uses Sig.(2-tailed) or p-value to show test result SPSS only does Two-tailed Divide this p-value by 2 to get one-tailed If p-value is less than (e.g. 0.05) then the test is significant Reject H 0, accept H 1 Thus the average total score is more than 55 at significance level 0.05 t: the calculated T value df: degree of freedom 18

Independent Samples T-Test Test mean of one sample against another Assumptions Independent variable consists of two independent groups. Dependent variable is either interval or ratio Dependent variable is approximately normally distributed Similar variances between the two groups (homogeneity of variances) 19 https://statistics.laerd.com/spss-tutorials/independent-t-test-using-spss-statistics.php

Independent Samples T-Test EX: Test if male students get lower total score than female students H 0 : m >= f H 1 : m < f If the hypothesis is true then we should reject H 0 and accept H 1 Calculate statistic (use SPSS) 20

Levenes Test for Equality of Variances For independent samples T-Test, the calculation for T value is different when: Both samples have the same variance ( 1 2 = 2 2 ) AND The variances are difference ( 1 2 != 2 2 ) Use variance test to determine this See Levenes Test for Equality of Variances in the table If the value of Sig. is >= (e.g. 0.05) then the two variance is equal use the first row of the result If the value of Sig. is >= (e.g. 0.05) then the two variance is NOT equal use the second row of the result 21

Result The p-value (2-tailed) is 0.033 < , thus the average score of male and female students are different The p-value (1-tailed) is 0.033/2 = 0.0165 < , thus the result is significant Check the Group Statistics, female group has higher mean, thus reject H 0 and accept H 1 - the research hypothesis is true According to Levenes Test, use the first row (Sig. = 0.530 > ) 22

Paired Sample T-Test Test means of paired samples against each other Same sample group (or two dependent samples) Assumptions Dependent variable is interval or ratio The differences in the dependent variable between the two related groups are approximately normally distributed. Independent variable consists of two related groups or "matched-pairs". No outliers in the differences between the two related groups. 23 https://statistics.laerd.com/spss-tutorials/dependent-t-test-using-spss-statistics.php

Paired Sample T-Test EX: Test if final score is not different from midterm score of the same group of student H 0 : D = 0 H 1 : D != 0 If the hypothesis is true then we should accept H 0 and reject H 1 Calculate statistic (use SPSS) 24

Result The p-value (2-tailed) is 0.000 < , thus the result is significant Thus reject H 0 and accept H 1 - the research hypothesis is false 25

Testing Categorical Data or Proportion One variable binomial proportion One variable multiple groups proportion (Goodness of Fit Test) Two variables Chi-square Test of Independence Two variables Test of Homogeneity 26

Binomial Determining the proportion of people in one of two categories is different from a specified amount H 0 : p D = p 0 H 1 : p D != p 0 SPSS assumes numerical data Recode data into number e.g. M,F -> 1,2 Analyze->Nonparametric Tests->Legacy Dialogs->Binomial E.g. the proportion of male student is 0.5 H 0 : p D = 0.5 H 1 : p D != 0.5 27

Result Careful about the Test Prop. SPSS considers the first observation (row) as first group Exact Sig. is 0.04 < , the result is significant, thus reject H 0 and accept H 1 - proportion of male students is not 0.5 If tested at 0.6 28

Multiple Groups Goodness of Fit Test Determining the proportion of groups is different from a specified ratio O: Observed E: Expected Analyze -> Nonparametric Tests -> Legacy Dialogs -> Chi-Square E.g. the proportion of sections is 1:2:1:2:1 29 https://statistics.laerd.com/spss-tutorials/chi-square-goodness-of-fit-test-in-spss-statistics.php

Result The values in the expected values ratio correspond to groups in order of appearance in the observation row. Asymp.Sig. = 0.000 < , the result is significant, thus reject H 0 and accept H 1 - the proportion is not 1:2:1:2:1 30 Frequency less than 5 might make the analysis not meaningful

Chi-square Test of Homogeneity Used to determine whether the proportion of one variable is similar when grouped by another variable two or more groups in each variable H 0 : p 1 = p 2 = p 3 = = p n H 1 : p 1 p 2 p 3 p n Data -> Weight Cases -> Weight cases by -> Do not weight cases SPSS uses proportion of total population Select frequency variable to test dependency Analyze -> Descriptive Statistics -> Crosstabs Statistics -> Tick Chi-square Cells -> Tick Expected (optional) 31

Result E.g. The proportion of selected of major is similar in both genders of student? H 0 : p m = p f H 1 : p m p f Pearson Chi-Square: Asymp.Sig. 0.010 < Reject H 0 and accept H 1 - the proportion is not similar in each gender 32

Chi-square Test of Independence Used to determine whether the effects of one variable depend on the value of another variable (2 variables) H 0 : Variable x and variable y are independent of each other H 1 : Variable x and variable y are dependent of each other H 0 : (O - E) 2 = 0 H 1 : (O - E) 2 0 Data -> Weight Cases -> Weight cases by -> Do not weight cases SPSS uses proportion of total population Select frequency variable to test dependency Analyze -> Descriptive Statistics -> Crosstabs Statistics -> Tick Chi-square Cells -> Tick Expected (optional) 33 https://statistics.laerd.com/spss-tutorials/chi-square-test-for-association-using-spss-statistics.php

Result E.g. Determine if gender and major are independent based on total score H 0 : gender and major are independent of each other H 1 : gender and major are dependent of each other Pearson Chi-Square: Asymp.Sig. 0.00 < Reject H 0 and accept H 1 - the two variables are dependent of each other based on total score 34

Are they the same? Test of Homogeneity and Test of Independence use the same calculation Test of Homogeneity tells if the proportion is the same H 0 : Proportion is similar for all groups H 1 : Proportion not similar for some/all groups Test of Independence tells if two variables are dependent H 0 : Two variables are independent H 1 : Two variables are dependent 35

Are they the same? Consider this The proportion of selected major is the same for any gender That means no matter the gender, the proportions remain the same That means gender has no effect of selection of major and therefore the two are independent 36

Documents

Estimation and Hypothesis Testing Faculty of Information Technology King Mongkut’s University of Technology North Bangkok 1