104
Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Embed Size (px)

Citation preview

Page 1: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Lincoln JiangStatistical Consultant

Western Michigan UniversityThe Graduate College

Graduate Center for Research and Retention

Page 2: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Definition of Statistics

Statistics is the art of making numerical conjectures about puzzling questions.

--- Statistics Fourth Edition

by Freedman

Page 3: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Definition of Statistics

Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data.

---From Wikipedia

Page 4: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Definition of Statistics

Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data.

---From Wikipedia

Page 5: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Definition of Statistics

Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data.

---From Wikipedia

Page 6: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Definition of Statistics

Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data.

---From Wikipedia

Page 7: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Definition of Statistics

Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data.

---From Wikipedia

Page 8: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Basic TermsVariables

Characteristics that can take on any number of different values

ValuesPossible numbers or categories that of a

variable can haveScores

A particular person’s value on a variable

Page 9: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Types of DataQualitative data --nonnumeric

eg: types of material {straw, sticks, bricks}Quantitative -- numeric Discrete data --numeric data that have a finite number

of possible values eg: counting numbers, {1,2,3,4,5} Continuous data

--numeric data that have a infinite number of possible values

eg: Real numbers

Page 10: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Types of ScaleNominal---have no order and thus only gives names or

labels to various categories. Variables assessed on a nominal scale are called

categorical variables

Ordinal---have order, but the interval between measurements is not meaningful.

Interval---have meaningful intervals between measurements, but there is no true starting point (zero).

Eg: temperature with the Celsius scale

Ratio---have the highest level of measurement. Ratios between measurements as well as intervals are meaningful because there is a starting point (zero).

Eg: length, time, plane angle, energy

Page 11: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

EX

Page 12: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Definition of Statistics

Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data.

---From Wikipedia

Page 13: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Collecting Data

“Twenty-five percent of Americans doubt that the Holocaust ever occurred.”

--- a news in 1993

Census

Sample Survey

Page 14: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Why Study Samples?Often not practical to study an entire populationInstead, researchers attempt to make samples

representative of populationsRandom selection

Each member of population has an equal chance of being sampled

Good but difficultHaphazard selection

Take steps to ensure samples do not differ from the population in systematic ways

Not as good but much more practical

Page 15: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Sample vs. PopulationSample

Relatively small number of instances that are studied in order to make inferences about a larger group from which they were drawn

PopulationThe larger group from

which a sample is drawn

Page 16: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Sample vs. Population ExamplesPopulation

a. pot of beansb. larger circlec. histogram

Samplea. spoonfulb. smaller circlec. shaded scores

Page 17: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Sampling MethodsSimple Random Sampling

Systematic sampling

Stratified sampling

Cluster sampling

Other samplings: Quota sampling, Mechanical sampling and so on

Page 18: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Definition of Statistics

Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data.

---From Wikipedia

Page 19: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

After Collecting…….Before Analyzing….

Page 20: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Frequency TablesFrequency table

Shows how many times each value was used for a particular variable

Percentage of scores of each valueGrouped frequency table

Range of scores in each of several equally sized intervals

Page 21: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Steps for Making a Frequency Table

1. Make a list of each possible value, from highest to lowest

2. Go one by one through the data, making a mark for each data next to its value on the list

3. Make a table showing how many times each value on your list was used

4. Figure the percentage of data for each value

Page 22: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

A Frequency Table

Stress rating Frequency Percent,%10 14 9.3

9 15 9.9

8 26 17.2

7 31 20.5

6 13 8.6

5 18 11.9

4 16 10.6

3 12 7.9

2 3 2.0

1 1 0.7

0 2 1.3

Page 23: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

A Grouped Frequency TableStress rating interval Frequency Percent

10-11 14 9

8-9 41 27

6-7 44 29

4-5 34 23

2-3 15 10

0-1 3 2

Page 24: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Frequency GraphsHistogram

Depicts information from a frequency table or a grouped frequency table as a bar graph

EX2

Page 25: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Shapes of Frequency DistributionsUnimodal

Having one peakBimodal

Having two peaksMultimodal

Having two or more peaks

RectangularHaving no peaks

Page 26: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Symmetrical vs. Skewed Frequency DistributionsSymmetrical distribution

Approximately equal numbers of observations above and below the middle

Skewed distributionOne side is more spread out that the other, like

a tailDirection of the skew

Right or left (i.e., positive or negative) Side with the fewer scores Side that looks like a tail

Page 27: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Skewed Frequency DistributionsSkewed right (b)

Fewer scores right of the peakPositively skewedCan be caused by a floor effect

Skewed left (c)Fewer scores left of the peakNegatively skewedCan be caused by a ceiling effect

Page 28: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Ceiling and Floor EffectsCeiling effects

Occur when scores can go no higher than an upper limit and “pile up” at the top

e.g., scores on an easy exam, as shown on the right

Causes negative skewFloor effects

Occur when scores can go no lower than a lower limit and pile up at the bottom

e.g., household incomeCauses positive skew

Page 29: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

KurtosisDegree to which tails of the distribution are

“heavy” or “light”heavy tails = higher Kurtosis(b)Light tails = lower Kurtosis(c)Normal distribution= Zero Kurtosis (a)

Page 30: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Measures of Central TendencyCentral tendency = representative or typical

value in a distributionmean, the median and the mode

can measure central tendency.MeanComputed by

Summing all the scores (sigma, ) Dividing by the number of scores (N)

M

XN

Page 31: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Measures of Central TendencyMean

Often the best measure of central tendency Most frequently reported in research articles

Think of the mean as the “balancing point” of the distribution

Page 32: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Measures of Central TendencyMode

Most common single number in a distributionIf distribution is symmetrical and unimodal, the

mode = the meanTypical way of describing central tendency of a

nominal variable

Page 33: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Measures of Central TendencyMedian

Middle value in a group of scoresPoint at which

half the scores are above half the scores are below

Unaffected by extremity of individual scores Unlike the mean Preferable as a measure of central tendency when a

distribution has some extreme scores

Page 34: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Measures of Central TendencyExamples of means as

balancing points of various distributionsDoes not have to be a

score exactly at the median

Note that a score’s distance from the balancing point matters in addition to the number of scores above or below it

Page 35: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Measures of Central TendencyExamples of means

and modes

Page 36: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Measures of Central TendencySteps to computing the median

1. Line up scores from highest to lowest2. Figure out how many scores to the middle

Add 1 to number of scores Divide by 2

3. Count up to middle score If there is 1 middle score, that’s the median If there are 2 middle scores, median is their

average

Ex3

Page 37: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Measures of VariationVariation = how

spread out data isVariance

Measure of variationAverage of each score’s

squared deviations (differences) from the mean

Page 38: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Measures of VariationSteps to computing the variance

1. Subtract the mean from each data

2. Square each deviation value

3. Add up the squared deviation scores

4. Divide sum by the number of scores

ix x2( )ix x

2( )ix x2( )ix x

n

Ex4

Page 39: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Measures of VariationStandard deviation

Another measure of variation, roughly the average amount that scores differ from the mean

Used more widely than varianceAbbreviated as “SD”

To compute standard deviationCompute varianceSimply take the square root

SD is square root of variance Variance is SD squared

2SD Variance

Page 40: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Two Branches of Statistical MethodsDescriptive statistics

Summarize and describe a group of numbers such as the results of a research study

Inferential statisticsAllow researchers to draw conclusions and

inferences that are based on the numbers from a research study, but go beyond these numbers

Page 41: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

The Normal CurveOften seen in social and behavioral science

research and in nature generallyParticular characteristics

Bell-shapedUnimodalSymmetricalAverage tails

Bean Machine

Page 42: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Z Scoresindicates how many standard deviations

an observation is above or below the mean

If Z>0, indicate the data > meanIf Z<0, indicate the data < mean

Z score of 1.0 is one SD above the mean Z score of -2.5 is two-and-a-half SDs below the mean Z score of 0 is at the mean

SD

MXZ

)(

Page 43: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Z ScoresWhen values in a distribution are

converted to Z scores, the distribution will have Mean of 0Standard deviation of 1

UsefulAllows variables to be compared to one another

Provides a generalized standard of comparison

Page 44: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Z ScoresTo compute a Z

score, subtract the mean from a raw score and divide by the SD

To convert a Z score back to a raw score, multiply the Z score by the SD and then add the mean

SD

MXZ

)(

MSDZX ))((

Ex5

Page 45: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Confidence Intervalconfidence interval

(CI) is a particular kind of interval estimate of a population parameter.

How likely the interval is to contain the parameter is determined by the confidence level

"95% confidence interval"

Animation

ex6

Page 46: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

CorrelationA statistic for describing the relationship

between two variablesExamples

Price of a bottle of wine and its quality Hours of studying and grades on a statistics exam Income and happiness Caffeine intake and alertness

Page 47: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Graphing Correlations on a Scatter DiagramScatter diagram

Graph that shows the degree and pattern of the relationship between two variables

Horizontal axisUsually the variable that does

the predicting e.g., price, studying, income,

caffeine intake

Vertical axisUsually the variable that is

predicted e.g., quality, grades, happiness,

alertness

Page 48: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Graphing Correlations on a Scatter DiagramSteps for making a

scatter diagram1. Draw axes and

assign variables to them

2. Determine the range of values for each variable and mark the axes

3. Mark a dot for each person’s pair of scores

Page 49: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

CorrelationLinear correlationPattern on a scatter

diagram is a straight lineExample above

Curvilinear correlation More complex

relationship between variables

Pattern in a scatter diagram is not a straight line

Example below

Page 50: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

CorrelationPositive linear correlation

High scores on one variable matched by high scores on another

Line slants up to the rightNegative linear correlation

High scores on one variable matched by low scores on another

Line slants down to the right

Page 51: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

CorrelationZero correlation

No line, straight or otherwise, can be fit to the relationship between the two variables

Two variables are said to be “uncorrelated”

Page 52: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Correlation Reviewa. Negative linear

correlationb. Curvilinear

correlationc. Positive linear

correlationd. No correlation

Page 53: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Correlation CoefficientCorrelation coefficient, r,

indicates the precise degree of linear correlation between two variables

Computed by taking “cross-products” of Z scoresMultiply Z score on one variable by Z

score on the other variableCompute average of the resulting

productsCan vary from

-1 (perfect negative correlation) through 0 (no correlation) to +1 (perfect positive correlation)

Nr ZZ YX

Page 54: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention
Page 55: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Correlation and CausalityWhen two variables are

correlated, three possible directions of causalityX->YX<-YX<-Z->Y

Inherent ambiguity in correlations

Knowing that two variables are correlated tells you nothing about their causal relationship

Page 56: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

PredictionCorrelations can be used to make predictions

about scoresPredictor

X variable Variable being predicted from

Criterion Y variable Variable being predicted

Sometimes called “regression”

Page 57: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Multiple Correlation and Multiple RegressionMultiple correlation

Association between criterion variables and two or more predictor variables

Multiple regressionMaking predictions about criterion variables

based on two or more predictor variablesUnlike prediction from one variable,

standardized regression coefficient is not the same as the ordinary correlation coefficient

Page 58: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Proportion of Variance Accounted ForCorrelation coefficients

Indicate strength of a linear relationshipsCannot be compared directlye.g., an r of .40 is more than twice as strong as an r

of .20To compare correlation coefficients, square

themAn r of .40 yields an r2 of .16; an r of .20 an r2 of .04Squared correlation indicates the proportion of

variance on the criterion variable accounted for by the predictor variable

R-square

Page 59: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Most Commonly Used Statistical TechniquesLinear Regression (Predicts the value of one

numerical variable given another variable)- How much does the maximum legibility

distance of Highway signs decrease when age is increased?

Page 60: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Data on winning bid price for 12 Saturn cars on eBaY in July 2002

• Simple linear regression  is a data analysis technique that tries to find a linear pattern in the data.

•In linear regression, we use all of the data to calculate a straight line which may be used to predict Price based on Miles.

• Since Miles is used to predict Price, Miles is called an `Explanatory (Independent) Variable'    while Price is called a `Response (Dependent) Variable'.  

Page 61: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

•The slope   of the line is -.05127, which means that predicted Price tends to drop 5 cents for every additional mile driven, or about $512.70 for every 10,000 miles.

•The intercept (or Y-intercept)    of the line is $8136; this should not be interpreted as the predicted price of a car with 0 mileage because the data provides information only for Saturn cars between 9,300 miles and 153,260 miles

•We can now use the line to predict   the selling price of a car with 60000 miles. What is the height or Y value of the line at X=60000? The answer is

Page 62: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Most Commonly Used Statistical TechniquesT-test (for the means)- What is the mean time that college students

watch TV per day?- What is the mean pulse rate of women?

Page 63: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Hypothesis Testing

Procedure for deciding whether the outcome of a study supports a particular theory or practical innovation

Page 64: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Core Logic of Hypothesis TestingApproach can seem curious or even backwards

Researcher considers the probability that the experimental procedure had no effect and that the observed result could have occurred by chance alone

If that probability is sufficiently low, researcher will… Reject the notion that experimental procedure had no effect Affirm the hypothesis that the procedure did have an effect

Page 65: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

The Null Hypothesis and the Research HypothesisNull hypothesis (H0)

Opposite of desired result Usually that manipulation had no effect

Research hypothesis (H1)Also called the “alternative hypothesis”Opposite of the null hypothesisWhat the experimenter desired or expected all

along—that the manipulation did have an effect

Page 66: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

One-tailed vs. Two-tailed Hypothesis TestsDirectional prediction

Researcher expects experimental procedure to have an effect in a particular direction

One-tailed significance tests may be used

Nondirectional predictionResearch expects experimental procedure to

have an effect but does not predict a particular direction

Two-tailed significance test appropriateTakes into account that the sample could be

extreme at either tail of the comparison distribution

Page 67: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

One-tailed vs. Two-tailed Hypothesis TestsTwo-tailed tests

More conservative than one-tailed tests

Some believe that two-tailed tests should always be used, even when an experimenter makes a directional prediction

Page 68: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Significance Level Cutoffs for One- and Two-Tailed TestsThe .05 significance

level

The .01 significance level

Page 69: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Decision ErrorsWhen the right procedure leads to the

wrong conclusionType I Error

Reject the null hypothesis when it is trueConclude that a manipulation had an effect

when in fact it did notType II Error

Fail to reject the null when it is falseConclude that a manipulation did not have an

effect when in fact it did

Page 70: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

P-valueis the probability of obtaining a result at

least as extreme as the one that was actually observed, assuming that the null hypothesis is true.

Frequent misunderstandings

For more details, please refer to Wikipedia.

Page 71: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Decision ErrorsSetting a strict significance level (e.g., p

< .001)Decreases the possibility of committing a Type I

errorIncreases the possibility of committing a Type II

errorSetting a lenient significance level (e.g., p

< .10)Increases the possibility of committing a Type I

errorDecreases the possibility of committing a Type II

error

Page 72: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Test Statisticvalue computed from sample informationBasis for rejecting/ not rejecting the null

hypothesisused to compute the p-valueExample:

Page 73: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

T-testA t-test is most

commonly applied when the test statistic would follow a normal distribution. When the scaling term is unknown and is replaced by an estimate based on the data, the test statistic follows a Student's t distribution.

Page 74: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

t-testOne-sample t test

Two-sample t testIndependent two-sample

Dependent two-sample

Equal sample size, equal variance Unequal sample size, equal variance

Page 75: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

The Hypothesis Testing Process1. Restate the research question as a research

hypothesis and a null hypothesis about the populations

2. Set the level of significance, .3. Collect the sample and compute for the test

statistic.4. Assume Ho is true, compute the p-value.5. If p-value < , reject Ho.6. State your conclusion.

SUMMARY OF HYPOTHESIS TESTSEx7,8

Page 76: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Most Commonly Used Statistical Techniques

Analysis of Variance (testing differences of means for 2 or more groups)

- Is GPA related to where a student likes to sit (front, middle, back)?

- Which internet search engine is the fastest?

Page 77: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Analysis of VarianceAbbreviated as “ANOVA”Used to compare the means of more than two

groupsNull hypothesis is that all populations being

studied have the same meanReject null if at least one population has a

mean that differs from the others Actually works by analyzing variances

Page 78: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Two Different Ways of Estimating Population VarianceEstimate population variance from variation

within each groupIs not affected by whether or not null

hypothesis is true Estimate population variance from variation

between each groupIs affected by whether or not null hypothesis is

true

Page 79: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Two Important Questions1. How to estimate population variation from

variance between groups?

2. How is that estimate affected by whether or not the null is true?

Page 80: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Estimate population variance from variation between means of groupsFirst, variation among

means of samples is related directly to the amount of variation within each population from which samples are taken

The more variation within each population, the more variation in means of samples taken from those populations

Note that populations on the right produce means that are more scattered

Page 81: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Estimate population variance from variation between means of groupsAnd second, when null is false

there is an additional source of variation

When null hypothesis is true (left), variation among means of samples caused by Variation within the

populations

When null hypothesis is false (right), variation among means of samples caused by Variation within the

populations And also by variation among

the population means

Page 82: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Basic Logic of ANOVAANOVA entails a

comparison between two estimates of population variance

Ratio of between-groups estimate to within-groups estimate called an F ratio

Compare obtained F value to an F distribution Groups

BetweenF

Within Groups

Page 83: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Assumptions of an ANOVAPopulations follow a normal curve

Populations have equal variances

As for t tests, ANOVAs often work fairly well even when those assumptions are violated

Page 84: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Rejecting the Null HypothesisA significant F tells you that at least one of

the means differs from the othersDoes not indicate how many differDoes not indicate which one(s) differ

For more specific conclusions, a researcher must conduct follow-up t tests

Problem: Lots of t tests increases the chances of finding a significant result just by chance (i.e., increases chances beyond p = .05)

Page 85: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

ANOVA (continue)Procedure that allows one to examine two or

more variables in the same studyEfficientAllows for examination of interaction effects

An ANOVA with only one variable is a one-way ANOVA, an ANOVA with two variables is a two-way ANOVA, and so on

Page 86: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Main Effects vs. InteractionsA main effect refers to the effect of one

variable, averaging across the other(s)

An interaction effect refers to a case in which the effect of one variable depends on the level of another variable

Page 87: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Main Effects vs. Interactions

Page 88: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Most Commonly Used Statistical TechniquesChi-square test of independence

(Relationship of 2 categorical variables)-With whom is it easier to make friends with?- Does the opinion on legalization of marijuana

depend on one’s religion?

Page 89: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Chi-Square TestsHypothesis testing procedure for nominal

variablesFocus on number of people/items in each category

(e.g., hair color, political party, gender)

Compare how well an observed distribution fits an expected distribution

Expected distribution can be based onA theoryPrior resultsAssumption of equal distribution across categories

Page 90: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Chi-Square Test for Goodness of Fit

Single nominal variable

Degrees of freedom = number of categories minus 1

Page 91: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Chi-Square StatisticCompares observed frequency distribution to

expected frequency distributionCompute difference between observed and

expected and square each oneWeight each by its expected frequencySum them

22 ( )O E

E

Ex9

Page 92: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Chi-Square Distribution

Compare obtained chi-square to a chi-square distribution

Does mismatch between observed and expected frequency exceed what would be expected by chance alone?

Page 93: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Chi-Square Test for IndependenceTwo nominal

variablesIndependence

means no relation between variables

To determine degrees of freedom…

Contingency tableLists number of

observations for each combination of categories

To determine expected frequencies…

Column Rows( 1)( 1)df N N

( )R

E CN

Page 94: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Most Commonly Used Statistical Techniques

Correlation (Relationship of 2 numerical variables)

- Is there a connection between the average verbal SAT and the percent of graduates who took the SAT in a state?

Page 95: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Other Statistical Techniques Factor analysis (reducing independent variables which

are highly correlated)

Cluster analysis (grouping observations with similar characteristics)

Correspondence Analysis (grouping the levels of 2 or more categorical variables)

Time Series Analysis

And so on……..

Page 96: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Inference with highest confidence level

Page 97: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Definition of Statistics

Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data.

---From Wikipedia

Page 98: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Presentation of DataFOR CATEGORICAL DATA

---Bar Chart ---Pie Chart

Page 99: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Presentation of DataFOR NUMERICAL DATA --- Stem-and-Leaf Plot --- Histogram --- Boxplot

Page 100: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Overview of Statistical Techniques

Page 101: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention
Page 102: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Questions?

or

Comments ?

Page 103: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

Upcoming Workshops

10/26/2009 Overview of SPSS

12/02/2009 Overview of SAS

Page 104: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention

How to lie with statistics1. The Sample with Built-in Bias.

2. Well-Chosen Average.

3. The Gee-Whiz Graph.

4. Correlation and Causation.