114
Basic statistics Hui Bian Office for Faculty Excellence

Basic Statistics 9 19 2013

  • Upload
    jegosss

  • View
    20

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Basic Statistics 9 19 2013

Basic statistics

Hui Bian

Office for Faculty Excellence

Page 2: Basic Statistics 9 19 2013

Basic statistics

• My contact information:

–Hui Bian, Statistics & Research Consultant

–Office for Faculty Excellence, 1001 Joyner library, room 1006

– Email: [email protected]

–Website: http://core.ecu.edu/ofe/StatisticsResearch/

2

Page 3: Basic Statistics 9 19 2013

Basic statistics

• Statistics: “a bunch of mathematics used to summarize, analyze, and interpret a group of numbers and observations”.

*It is a tool.

*Cannot replace your research design, your research questions, and theory or model you want to use.

3

Page 4: Basic Statistics 9 19 2013

Basic statistics

• Population: any group of interest or any group that researchers want to learn more about.

–Population parameters (unknown to us): characteristics of population

• Sample: a group of individuals or data are drawn from population of interest.

–Sample statistics: characteristics of sample

4

Page 5: Basic Statistics 9 19 2013

Basic statistics

• We are much more interested in the population from which the sample was drawn.

– Example: 30 GPAs as a representative sample drawn from the population of GPAs of the freshmen currently in attendance at a certain university or the population of freshmen attending colleges similar to a certain university.

5

Page 6: Basic Statistics 9 19 2013

Basic statistics

6

Page 7: Basic Statistics 9 19 2013

Basic statistics

• Two types of statistics

–Descriptive statistics

–Inferential statistics

7

Page 8: Basic Statistics 9 19 2013

Basic statistics

• Descriptive statistics:

–“are procedures used to summarize, organize, and make sense of a set of scores or observations.”

8

Page 9: Basic Statistics 9 19 2013

Basic statistics

• Inferential statistics:

–“are procedures used that allow researchers to infer or generalize observations made with samples to the larger population from which they were selected.”

9

Page 10: Basic Statistics 9 19 2013

Descriptive statistics

• Use descriptive statistics to describe, summarize, and organize set of measurements.

• Use descriptive statistics to communicate with other researchers and the public.

• Descriptive statistics: Central tendency and Dispersion

10

Page 11: Basic Statistics 9 19 2013

Descriptive statistics

• Measures of Central tendency: we use statistical measures to locate a single score that is most representative of all scores in a distribution.

–Mean

–Median

–Mode

11

Page 12: Basic Statistics 9 19 2013

Descriptive statistic

• The notations used to represent population parameters and sample statistics are different.

–For example

•Population size : N

• Sample size : n

12

Page 13: Basic Statistics 9 19 2013

Descriptive statistics

• Mean

–𝑋 (or M) for sample mean and μ for population mean

–𝑋 (x bar) = ∑𝑥

𝑛

–∑x means sum of all individual scores of x1-xn

–n means number of scores

13

Page 14: Basic Statistics 9 19 2013

Descriptive statistics

• Example 1: we want to know how 25 students performed in math tests.

• Data are in the next slide.

14

Page 15: Basic Statistics 9 19 2013

Descriptive statistics

15

Score (X) Frequency (f) fX

60 1 60

65 2 130

70 3 210

75 4 300

80 5 400

85 4 340

90 3 270

95 2 190

100 1 100

Sum 25 2000

Page 16: Basic Statistics 9 19 2013

Descriptive statistics

• How to calculate mean for those 25 scores?

• 𝑋 = ∑𝑓𝑥

𝑛 =

2000

25 = 80.00

16

Page 17: Basic Statistics 9 19 2013

Descriptive statistics

• Distribution of Example 1

17

Mean = 80

Page 18: Basic Statistics 9 19 2013

Descriptive statistics

• Median

–Data: 2, 3, 4, 5, 7, 10, 80. Mean of those scores is 15.86.

–80 is an outlier.

–Mean fails to reflect most of the data. We use median instead of mean to remove the influence of an outlier.

–Median is the middle value in a distribution of data listed in a numeric order.

18

Page 19: Basic Statistics 9 19 2013

Descriptive statistics

• Median

–Position of median = 𝑛+1

2

–For odd –numbered sample size: 3,6,5,3,8,6,7. First place each score in numeric order: 3,3,5,6,6,7,8. Position 4. median = 6

19

Page 20: Basic Statistics 9 19 2013

Descriptive statistics

• Median

• For even-numbered sample size: 3,6,5,3,8,6. First place each score in numeric order: 3,3,5,6,6,8. Position

3.5. Median = 5+6

2 = 5.5

• Example 2: we want to know average salary of 36 cases.

20

Page 21: Basic Statistics 9 19 2013

Descriptive statistics Salary Frequency

$20k 1

$25k 2

$30k 3

$35k 4

$40k 5

$45k 6

$50k 5

$55k 4

$200k 3

$205k 2

$210k 1

Total 36 21

Page 22: Basic Statistics 9 19 2013

Descriptive statistics

• Median = ?

• Position 18.5

• Which number is at position 18.5?

• Median = $45k

22

Page 23: Basic Statistics 9 19 2013

Descriptive statistics

• Mode

–The value in a data set that occurs most often or most frequently.

–Example: 2,3,3,3,4,4,4,4,7,7,8,8,8. Mode = 4

23

Page 24: Basic Statistics 9 19 2013

Descriptive statistics

• Dispersion (Variability): a measure of the spread of scores in a distribution.

24

Page 25: Basic Statistics 9 19 2013

Descriptive statistics

• Compare different distributions

25

Page 26: Basic Statistics 9 19 2013

Descriptive statistics

• Compare different distributions

26

Page 27: Basic Statistics 9 19 2013

Descriptive statistics

• Two sets of data have the same sample size, mean, and median.

• But they are different in terms of variability.

27

Page 28: Basic Statistics 9 19 2013

Descriptive statistics

• Dispersion

–Range

–Variance

–Standard deviation

28

Page 29: Basic Statistics 9 19 2013

Descriptive statistics

• Range

–It is the difference between the largest value and smallest value.

–It is informative for data without outliers.

29

Page 30: Basic Statistics 9 19 2013

Descriptive statistics

• Variance

–It measures the average squared distance that scores deviate from their mean.

–Sample variance (s2, population variance σ2 sigma)

30

Page 31: Basic Statistics 9 19 2013

Descriptive statistics

• How to calculate variance?

–𝑠2= ∑ 𝑥 −𝑥 2

𝑛−1 or

𝑠𝑠

𝑛−1: ss means sum of

squares.

–n-1 means: degree of freedom: the number of scores in a sample that are free to vary.

31

Page 32: Basic Statistics 9 19 2013

Descriptive statistics

• Example: five scores: 5, 10, 7, 8, 15

–Mean = 9

–Let’s calculate variance

• SS = (5-9)2 + (10-9)2 + (7-9)2 + (8-9)2 + (15-9)2 = 58

• Sample variance = 58/(5-1) = 14.5

32

Page 33: Basic Statistics 9 19 2013

Descriptive statistics

• Degree of freedom – Example 1. we have five scores: 1, 2, 3, and

two unknown scores: x and y. The mean of five values is equal to 3. x + y = 9.

– Example 2. we have five scores: 1, 2, and three unknown scores: x, y, and z. The mean of five values is equal to 3. x + y + z = 12.

33

Page 34: Basic Statistics 9 19 2013

Descriptive statistics

• Standard deviation (s, σ)

–It is the square root of variance.

–It is average distance that scores deviate from their mean.

–𝑠 = 𝑠𝑠

𝑛−1

34

Page 35: Basic Statistics 9 19 2013

Descriptive statistics

• Example 3: calculate standard deviation

35

Scores (x) Frequency(f) 𝒙 − 𝒙 (d) d2 fd2(ss)

100 6 100-115.5=-15.5 240.25 6*240.25

110 12 110-115.5= -5.5 30.25 12*30.25

120 16 120-115.5=4.5 20.25 16*20.25

130 6 130-115.5=14.5 210.25 6*210.25

Sum 40 3390.0

Page 36: Basic Statistics 9 19 2013

Descriptive statistics

• s = 3390

40−1= 9.32

• 𝑋 = 115.5

• Summary:

–When individual scores are close to mean, the standard deviation (SD) is smaller.

36

Page 37: Basic Statistics 9 19 2013

Descriptive statistics

• Summary

–When individual scores are spread out far from the mean, the standard deviation is larger.

–SD is always positive

–It is typically reported with mean.

37

Page 38: Basic Statistics 9 19 2013

Descriptive statistics

• Choosing proper measure of central tendency depends on:

–the type of distribution

–the scale of measurement

38

Page 39: Basic Statistics 9 19 2013

Descriptive statistics

• Mean describes data that are normally distributed and measures on an interval or ratio scale.

• Median is used when the data are not normally distributed.

39

Page 40: Basic Statistics 9 19 2013

Descriptive statistics

• Normal distribution

–Probability: the frequency of times an outcome is likely to occur divided by the total number of possible outcomes.

• It varies between 0 and 1.

• Example (next slide)

40

Page 41: Basic Statistics 9 19 2013

Descriptive statistics

• Normal distribution

41

Fail Pass Total

Male 3 2 5

Female 1 4 5

Total 4 6 10

1. What is the probability of Fail? 4/10 =.4 2. What is the probability of Pass? 6/10 = .6 3. What is the probability of Fail among males? 3/5 = .6 4. What is the probability of Pass among females? 4/5 = .8

Page 42: Basic Statistics 9 19 2013

Descriptive statistics

• Normal distribution/Normal curve

–Data are symmetrically distributed around mean, median, and mode.

–Also called the symmetrical, Gaussian, or bell-shaped distribution.

42

Page 43: Basic Statistics 9 19 2013

Descriptive statistics

• Normal curve

43

Page 44: Basic Statistics 9 19 2013

Descriptive statistics

• Normal curve

44

Page 45: Basic Statistics 9 19 2013

Descriptive statistics

• Characteristics of normal distribution

–The normal distribution is mathematically defined.

–The normal distribution is theoretical.

–The mean, median, and mode are all the same value at the center of the distribution.

45

Page 46: Basic Statistics 9 19 2013

Descriptive statistics

• Characteristics of normal distribution

–The normal distribution is symmetrical.

–The form of a normal distribution is determined by its mean and standard deviation.

–Standard deviation can be any positive value.

46

Page 47: Basic Statistics 9 19 2013

Descriptive statistics

• Characteristics of normal distribution

–The total area under the curve is equal to 1.

–The tails of normal distribution are always approaching to x axis, but never touch it.

47

Page 48: Basic Statistics 9 19 2013

Descriptive statistics

• Normal distribution/Normal curve

–We use normal distribution to locate probabilities for scores.

–The area under the curve can be used to determine the probabilities at different points.

48

Page 49: Basic Statistics 9 19 2013

Descriptive statistics

49

Proportions of area under the normal curve

Page 50: Basic Statistics 9 19 2013

Descriptive statistics

• Normal distribution: the standard deviation indicates precisely how the scores are distributed. Empirical rule:

–About 68% of all scores lie within one standard deviation of the mean. In another word, roughly two thirds of the scores lie between one standard deviation on either side of the mean.

50

Page 51: Basic Statistics 9 19 2013

Descriptive statistics

• Normal distribution

–About 95% of all scores lie within two standard deviation of the mean (Normal scores: close to the mean).

–About 99.7% of all scores lie within three standard deviation of the mean.

51

Page 52: Basic Statistics 9 19 2013

Descriptive statistics

• In another word, we have 95% chance of selecting a score that is within 2 standard deviation of mean.

• Less than 5% scores are far from the mean (NOT normal scores).

52

Page 53: Basic Statistics 9 19 2013

Descriptive statistics

• Standard normal distribution or Z distribution

–A normal distribution with mean = 0, and standard deviation = 1.

–A Z score is a value on the x-axis of a standard normal distribution

53

Page 54: Basic Statistics 9 19 2013

Descriptive statistics

• Standard normal distribution or Z distribution

54

Page 55: Basic Statistics 9 19 2013

Descriptive statistics

• z transformation

z =𝑋−𝑀

𝑆𝐷

55

X means individual value, M is mean and SD is standard deviation. In SPSS, go to Analyze > Descriptive Statistics > Descriptives to get Z scores

Page 56: Basic Statistics 9 19 2013

Descriptive statistics

• Normal table/z table

56

Page 57: Basic Statistics 9 19 2013

Descriptive statistics

• How to use z table?

–Example: a sample of scores are approximately distributed normally with mean 8 and standard deviation 2. What is the probability of score lower than 6?

57

Page 58: Basic Statistics 9 19 2013

Descriptive statistics

• How to use z table?

–Transform a raw score 6 into a z score

–z = (6-8)/2=-1

–Check the normal table p (probability) = 0.5-0.34=0.16

–The probability of obtaining score less than 6 is 16%

58

Page 59: Basic Statistics 9 19 2013

Descriptive statistics

59

Page 60: Basic Statistics 9 19 2013

Descriptive statistics

• Descriptive statistics in SPSS

–Frequencies

–Descriptives

–Explore

60

Page 61: Basic Statistics 9 19 2013

Descriptive statistics

• Exercise: use 2011 YRBSS data

–Use Explore function to get descriptive statistics for Q6 (height)

–Analyze > Descriptive Statistics > Explore

61

Page 62: Basic Statistics 9 19 2013

Descriptive statistics

62

Page 63: Basic Statistics 9 19 2013

Descriptive statistics

• SPSS output

63

Page 64: Basic Statistics 9 19 2013

Descriptive statistics

• SPSS output: Normal Quantile-Quantile (Q-

Q) plot

64

Page 65: Basic Statistics 9 19 2013

Inferential statistics

• The goal of statistics is to make inferences about a population based upon information obtained in a sample.

• Hypothesis testing is the method we use to test a claim or hypothesis about a parameter in a population using observed data.

65

Page 66: Basic Statistics 9 19 2013

Inferential statistics

• Steps of hypothesis testing

–State a hypothesis

–Set the criterion

–Compare what we observe with what we expect.

–Make a decision

66

Page 67: Basic Statistics 9 19 2013

Inferential statistics

• Five elements in significance test

–Assumptions

–Hypotheses

–Test statistics

–P-value

–conclusion

67

Page 68: Basic Statistics 9 19 2013

Inferential statistics

• Assumptions

–Type of data

–Form of the population distribution

–Method of sampling

–Sample size

68

Page 69: Basic Statistics 9 19 2013

Inferential statistics

• State a hypothesis

–Null hypothesis (H0): in a hypothesis testing, we start by assuming the null hypothesis is true.

–Alternative hypothesis: directly contradicts null hypothesis

– The hypothesis testing is all about testing null hypothesis.

69

Page 70: Basic Statistics 9 19 2013

Inferential statistics

• Null hypothesis (H0): population’s values are NOT different from each other.

– Example: H0 : There is NO difference in blood pressure between treatment group and control group among patients.

– Example: H0 : μ1= μ2 or μ1- μ2 = 0

– Example: H0 : two samples are drawn from the same population.

70

Page 71: Basic Statistics 9 19 2013

Inferential statistics

• Alternative hypothesis (H1): population’s values are different from each other.

– Example: H1 = There is difference in blood pressure between treatment group and control group among patients (nondirectional-two-tailed test).

– Example: or H1 = The blood pressure of treatment group is lower than the blood pressure of control group (directional-one-tailed test).

71

Page 72: Basic Statistics 9 19 2013

Inferential statistics

• Or two samples are drawn from the different populations.

• Or H1: μ1- μ2 ≠ 0

• Or H1: μ1- μ2 > 0

• Or H1: μ1- μ2 < 0

72

Page 73: Basic Statistics 9 19 2013

Inferential statistics

• The difference in blood pressure between treatment and control group is because of random error or chance (not statistically different).

• Or the difference is large enough to conclude that blood pressure values are statistically different between two groups or because of treatment effect.

73

Page 74: Basic Statistics 9 19 2013

Inferential statistics

• Set the criterion: set the level of significance (a prespecified cutoff point)

• Typically set at 0.05 (α level) or 0.01.

• The smaller the α level, the stronger the evidence must be to reject H0.

74

Page 75: Basic Statistics 9 19 2013

Inferential statistics

• P-value

• P-value is the probability of obtaining test statistic from sample data when null hypothesis is true.

• If p-value is less than 5%, we reject the null hypothesis (why?).

75

Page 76: Basic Statistics 9 19 2013

Inferential statistics

• Compute the test statistic

• Test statistic: such as t, F values (obtained value depends on tests used in data analysis): measures the extent of apparent departure from H0.

76

Page 77: Basic Statistics 9 19 2013

Inferential statistics

• Compute the test statistic – Example: we want to know whether there is

a difference between gender in height.

– Two variables: Q2 (gender) and Q6 (height)

–We want to compare two means

–H0 : μ1 (female) = μ2 (male)

–H1: μ1 (female) ≠ μ2 (male)

77

Page 78: Basic Statistics 9 19 2013

Inferential statistics

• We use independent-samples t test to get t statistic.

78

Page 79: Basic Statistics 9 19 2013

Inferential statistics

• Compute the test statistic

– The value of test statistic is used to make a decision regarding the null hypothesis: compare test statistic to the critical value.

79

Page 80: Basic Statistics 9 19 2013

Inferential statistics

80

• Obtain critical value: it depends on degree of freedom. –A cut-off value

–We need to look at t test table for example to obtain critical value.

– If the test statistics is less than the critical value, then you fail to reject the null hypothesis.

Page 81: Basic Statistics 9 19 2013

Inferential statistics

• Make a decision

–Compare test statistic to critical value

–p value: p value is the probability of obtaining a test statistic given that null hypothesis is true.

–Significance: when p < .05, reject the null hypothesis, we reach significance.

81

Page 82: Basic Statistics 9 19 2013

Inferential statistics

• Make a decision: use t test example

–t value is -98.11

–df = 13997

–Look at t table to get critical value

• It is equal to 1.96

–t value < -1.96

–Reject Null hypothesis

82

Page 83: Basic Statistics 9 19 2013

Inferential statistics

• Critical region

83

(Neutens & Rubinson, 1997)

Page 84: Basic Statistics 9 19 2013

Inferential statistics

• Types of errors

– Type I error (α): the probability of rejecting a null hypothesis that is actually true.

– Type II error (β): the probability of retaining a null hypothesis that is actually false.

84

Truth in the population

Fail to reject the Null Reject the Null

True Correct 1-α Type I error α

False Type II error β Correct 1-β (power)

Page 85: Basic Statistics 9 19 2013

Inferential statistics

• Type I and Type II errors are inversely related, which means the smaller the α level, the larger the Type II error.

• To keep both errors low, large sample size is important.

85

Page 86: Basic Statistics 9 19 2013

Inferential statistics

• Power: the probability of rejecting H0

when it is in fact false.

–Power = 1-β (β is Type II error)

–Statistical power is the ability to detect a true effect.

86

Page 87: Basic Statistics 9 19 2013

Inferential statistics

• Power

• If statistical power is high, the probability of making a Type II error, or concluding there is no effect when, in fact, there is one, goes down.

87

Page 88: Basic Statistics 9 19 2013

Inferential statistics

• Power

–For example, 80% power in a clinical trial means that the study has a 80% chance of ending up with a p value of less than 5% in a statistical test (i.e. a statistically significant treatment effect) if there really was an important difference between treatments.

88

Page 89: Basic Statistics 9 19 2013

Inferential statistics

• Effect size

–Example: how much of an effect/a difference the intervention had/made (magnitude of intervention effect).

–We want to know if the intervention effect is large or small, meaningful or trivial.

89

Page 90: Basic Statistics 9 19 2013

Inferential statistics

• Effect size

–Mean difference

–Correlation coefficient

–Odds ratio

–R2

90

Page 91: Basic Statistics 9 19 2013

Inferential statistics

• The relationship between effect size and power and sample size

–When effect size increases, power increases.

–When sample size is large enough, the power increases.

–Example: Cohen’s d = M1 - M2 / spooled

91

Page 92: Basic Statistics 9 19 2013

Inferential statistics

92

• Test means: t tests and Analysis of variance

–T tests

•one sample t test

• Independent-samples t test

•Paired-samples t test

Page 93: Basic Statistics 9 19 2013

Inferential statistics

• Test means: t tests and Analysis of variance

–Analysis of variance (ANOVA)

•One-way/two-way between subject design

•One-way/two-way within subject design

•Mixed design

93

Page 94: Basic Statistics 9 19 2013

Inferential statistics

• Correlation

• Linear regression

• Non-parametric tests

– Chi-Square tests

– Sign test

– Wilcoxon signed-rank t test

– Mann-Whitney U test

– Kruskal-Wallis H test

– Friedman test

94

Page 95: Basic Statistics 9 19 2013

Inferential statistics

• SPSS demonstration

95

Page 96: Basic Statistics 9 19 2013

Inferential statistics

• SPSS demonstration

96

Page 97: Basic Statistics 9 19 2013

Inferential statistics

• SPSS demonstration

97

Page 98: Basic Statistics 9 19 2013

Inferential statistics

• SPSS demonstration

98

Page 99: Basic Statistics 9 19 2013

Inferential statistics

• T-test for two independent sample means

–Example: we want to know if there is a gender (Q2) difference in height.

–H0: µ1(Female) = µ2(Male);

–H1: µ1(Female) ≠ μ2(Male)

99

Page 100: Basic Statistics 9 19 2013

Inferential statistics

• Think about the following.

–The mean differences by two groups can be due to chance.

–sampling and measurement error.

–Tests and measuring instrument used to collect data are not perfect.

100

Page 101: Basic Statistics 9 19 2013

Inferential statistics

• Calculate t value: SPSS can do that for us.

• 𝑡 = 𝑀

1−𝑀

2

𝑠1

2

𝑁1

+𝑠2

2

𝑁2

101

Page 102: Basic Statistics 9 19 2013

Inferential statistics

• T-test for two independent sample means

102

t = -98.15 < tcritical = -1.96 (critical value), p < .05. There is a significant difference in height between females and males. When sample size is greater than 120, tcritical

= 1.96 at α = 0.05.

Page 103: Basic Statistics 9 19 2013

Descriptive statistics

• Confidence interval (CI) for a Mean

–“ A CI for a parameter is a range of numbers within which the parameter is believed to fall.”

103

Page 104: Basic Statistics 9 19 2013

Descriptive statistics

104

Page 105: Basic Statistics 9 19 2013

Descriptive statistics

• Standard error

–“ is the standard deviation of a sampling distribution of sample means. It is the distance that sample mean values deviate from the value of the population mean.”

105

Page 106: Basic Statistics 9 19 2013

Descriptive statistics

• How to calculate CI?

–Compute sample mean and standard error.

–Choose the level of confidence interval and find the critical value at the level of confidence.

–Compute the estimation formula to find the confidence interval

106

Page 107: Basic Statistics 9 19 2013

Descriptive statistics

• 95% CI = 𝑋 ± critical value at 95% level ( α = .05) * standard error

• Example 4: two-independent sample t-test, gender differences in height.

–We use YRBSS 2011 data.

–Q2 (Gender) and Q6 (height)

107

Page 108: Basic Statistics 9 19 2013

Descriptive statistics

• Two-Independent sample t-test: go to Analyze > Compare Means > Independent Sample T Tests

108

Page 109: Basic Statistics 9 19 2013

Descriptive statistics

• Click Option > Choose 95%

109

Page 110: Basic Statistics 9 19 2013

Descriptive statistics

• Output

110

Page 111: Basic Statistics 9 19 2013

Descriptive statistics

• Output

111

Page 112: Basic Statistics 9 19 2013

Descriptive statistics

• In this case, critical value = 1.96 (check t distribution table, df = ∞)

• What do we learn from the 95% CI of mean difference?

112

Page 113: Basic Statistics 9 19 2013

Basic statistics

• References

–Agresti, A. & Finlay, B. (1997). Statistical methods for the social sciences. Upper Saddle River, NJ. Prentice Hall, Inc.

–Neutens, J. J., & Rubinson, L. (1997). Research techniques for the health sciences. Needham Heights, MA. Allyn & Bacon.

113

Page 114: Basic Statistics 9 19 2013

Basic statistics

• References

–Privitera, G. J. (2012). Statistics for the behavioral sciences. Thousand Oaks, CA. SAGE Publications, Inc.

114