Download ppt - INFO 515Lecture #101 Action Research Review INFO 515 Glenn Booker

INFO 515 Lecture #10 1

Action ResearchReview

INFO 515Glenn Booker


Why do we do this? Measurements are needed to understand

a system, and predict its future behavior Statistical techniques provide a commonly

accepted means of analyzing measurements

Statistics is based on recognizing that measurements tend to fall over a range of values, not just one precise number


Types of Research Historical (what

happened?) Descriptive (what is

happening?) Developmental

(over time) Case and Field (study

an organization)

Correlational (does A affect B?)

Causal Comparative (what caused it)

True Experimental (single / double blind)

Quasi-Experimental Action Research


Data Analysis Raw data, such as one survey result Refined data, such as the distribution of

ages of Philadelphia residents Derived data, such as comparing the age

distribution of Philadelphia residents to that of the country


Population vs. Sample Often the subject of interest (population) is

so big it isn’t feasible to measure it all Then a sample of measurements can be

made, and we want to relate the sample measurement to the population


Sampling Sampling can be done using probabilistic

techniques (e.g. various random samples) Simple or stratified random, Cluster (geographic), or Systematic (every Nth) samples

Or using non-probabilistic methods (whoever’s convenient, specific groups, or experts)


Customer Satisfaction Surveys A special case of sampling, customer

satisfaction surveys are often done using: In person interview Telephone interview Questionnaire by mail

Sample sizes are based on the allowable error, population size, and the result obtained


Measurement Scales Measurements can use four major types

of scales; the types of analysis possible depend strongly on the type of measurements used Nominal (named buckets, without sequence) Ordinal (ordered buckets) Interval (intervals mean something, can +-) Ratio (you can form ratios, can +-*/ )


Discrete versus Continuous Discrete (nonparametric) measurements

use nominal or ordinal scales; only specific values are allowed Car make = Chevy, or cost = High

Continuous (parametric) measurements use interval or ratio scales, and generally have integer or real number values Temperature = 98.6 deg F, Height = 172.1 cm


Descriptive Statistics Many common statistics can describe the

central tendency of a set of measurements Average (arithmetic mean) Minimum, Maximum, Range Median (middle value) Mode (most common value)


Normal Distribution Many measurements can be described by

a “normal” distribution, which is summarized by an average value and a standard deviation, or s

We can predict how likely any range of values is to occur for a normal distribution (how often is X between 5 and 8?)


Z Score Z scores measure how far from the mean

a single measurement isz = (Xi -

Same formula used for finding “t” too Does not only apply to a normal

distribution, but if it does, then we can predict the probability of that value or higher/lower occurring


Standard Error A sample of N measurements will have a

standard error SEx = s / sqrt(N) The standard error allows us to define the

confidence interval, CICI = mean +/- crit*SExwhere “crit” is the critical z score for a large sample, or the critical t score for a small sample


Critical z and t The critical z score is only a function of the

desired confidence level of the results (zc = 1.96 for 95% confidence level)

Critical t score is a function of the sample size (degrees of freedom, df = n-1) and the desired confidence level As df gets very large, critical t critical z


Confidence Level We have to accept some level of

uncertainty in a statistical analysis – our conclusion might be wrong!

Generally, a 95% level of confidence is used, unless life is on the line - then a 99% level of confidence is required Use 95% typically, hence critical significance

is 0.050


Confidence Level The level of confidence of your results,

plus the critical significance, always equals exactly one

For practically every statistical test, having the Significance of the result less than the critical value means to reject the null hypothesis If Sig actual < Sig crit, reject null hypothesis


Frequency and Percentage Frequency graphs and crosstabs can

provide a lot of information just from counts of a nominal or ordinal measurement occurring, possibly given with the percentages of each event’s occurrence

Histograms can provide similar charts for ratio or interval scaled data


Scatterplots Scatter plots or diagrams show the

relationship between two or more measures The horizontal axis is generally the

independent variable (X), sometimes also called a factor or grouping variable

The vertical axis is generally the dependent variable (Y), which is the measure you’re trying to understand


Hypothesis Testing Some statistics are used in the context of

testing a hypothesis - a statement whose truth you wish to determine Are Philadelphians more likely to be Nobel

Prize winners? The Null hypothesis is the opposite of the

hypothesis, and generally says there is no difference or no effect observed Philadelphians no more likely to be Nobel Prize

winners than any other group


Hypothesis Testing Can’t truly PROVE anything - only

determine if the differences observed are “not likely to be due to chance”

Select one or more “Tests of Significance” to determine if there is a statistically significant difference (Yes/No); if Yes, then can Select one or more “Measures of Association”

to describe the strength of the difference, and possibly its direction


One versus Two Tailed Tests A null hypothesis which tests for “no

difference” uses a two tailed test A null hypothesis which specifically tests

for “greater than” uses a one tailed test A null hypothesis which specifically tests

for “less than” uses a one tailed test One versus two tailed changes the critical z

or t score; generally makes the test easier to show significance – that’s why two-tailed tests are used


Z or T Test The z or t tests can be used to compare

two distribution means, or compare one distribution mean to a fixed value (interval or ratio data)

Compare the actual z or t score to the critical z or t score

If the actual z or t score is closer to zero than the critical value, accept the null hypothesis


Z or T Test (Two Tailed)

-crit +crit

Accept Null Hypothesis

mean

Reject NullHypothesis


z or tscale

Xactual z

or t

Notice this is for the x or t value, NOT the significance of that value


Z or T Test (One Tailed)

+crit

Accept Null Hypothesis

mean


z or tscale

Xactual z

or t

(Case here is testing if the actual value is greater than the mean; for a “less than” case, use only the negative critical value.)


Is My Sample Normal? Boxplots and stem-and-leaf diagrams can

help show graphically whether a sample has a fairly normal distribution

The skewness and kurtosis of a data set can help identify non-normality, if their values are more than two times their own standard errors


T Tests T tests compare means for ratio or interval

data Independent t test is for two different strata

within one data set Paired t test is to compare measures of the

same group before and after some event (drug test), or the samples are otherwise believed to be dependent on each other

One-sample t test compares one sample to a fixed value


T Tests Null hypothesis is that there is no

difference between the means Results (e.g. significance) may differ if

variances are not equal, since df changes The Levene test checks for equal

variances Null hypothesis for the Levene test is that the

variances are equal If the Levene significance < 0.050, variances

are not equal (reject the null hypothesis)


Independent T Test Evaluation Three ways to check the results of a T test

If the T test’s significance < 0.050, reject the null hypothesis

Check the stated t value against the critical t value for this ‘df’ level; if t(actual) > t(critical) reject the null hypothesis

If the confidence interval for the difference between the means does not include zero, reject the null hypothesis


Evaluating Significance

Critical0.050

Accept NullHypothesis

0

Reject Null Hypothesis

SignificanceX

ActualSig.


Paired T Test Evaluation Checks before and after test cases Includes a correlation factor (like ‘r’)

Can use paired test if significance < 0.050 Larger correlation factor means stronger

relationship between the variables Test evaluation as Independent T Test

Significance, ‘t’ value, and confidence interval


One-Sample T Test Compare a sample mean to a fixed value Test shows the actual values of means,

with their std deviation and std error Same interpretation of results

Significance, ‘t’ value, and confidence interval


F Test and ANOVA Compare several means against each

other using Analysis of Variance (ANOVA) and the F test

Like extending the T tests to many variables

Want data from random samples of normal populations with equal variances


F Test and ANOVA Output includes the Levene test

Want significance for Levene > 0.050, so that equal variances can be assumed

Otherwise, should not use ANOVA Evaluate F by its significance

If Sig. < 0.050, reject the null hypothesis (there is a significant difference among the means)


Additional ANOVA Tests Once the F test shows there is some

difference in the means across a subset, additional ANOVA tests can help identify more specific trends and differences

Types of tests (see end of lecture 6) include Pairwise Multiple Comparisons Post Hoc Range Tests


Pairwise Multiple Comparisons Pairwise Multiple Comparisons check two

subsets of data at a time Bonferroni test is better for a small number

of subsets Tukey test is better for many subsets

Both assume subset variances are equal For each pair of subset values,

Sig < 0.050 means the difference in means is significant


Post Hoc Range Tests Post Hoc Range Tests look for groups

within each subset which all have similar variances Tukey and Tukey’s-b tests include Post Hoc

Range Tests Each column of the output is a subset with

statistically similar means Subsets may overlap substantially


Contrasts Across Means Look across subset means to see if there is

a trend, such as a linear increase or decrease across subsets

Can check for Linear, Quadratic, or Cubic relationships (i.e. first, second, or third order polynomials)

Check Significance of F for the Unweighted version of each relationship (Linear, etc.) if Sig. < 0.050, reject the null hypothesis


Determine Linearity An option under Compare Means / Means

allows checking just for linearity This confirms the ANOVA test result for

Linearity And gives R and Eta parameters, which

are Measures of Association


R and Eta Pearson’s R * measures how well the data

fits the regression (-1 is a perfect negative correlation, 0 is no relationship, 1 is perfect positive correlation), and describes the amount of shared variance between them

Eta squared gives how much of the variance in one variable is caused by the changes in the other variable

* Named for English statistician Karl Pearson, 1857-1936 (per http://human-nature.com/nibbs/03/kpearson.html)


Regression Analysis Regression Analysis looks at two interval

or ratio-scaled variables (generically X and Y) and tries to fit an equation between them

A dozen different equations are available Linear, Power, Logarithmic, Exponential, etc.

Significance is checked by ANOVA F, and Sig. of the regression coefficients; association is measured with R Squared


Regression Analysis For a regression to have any significance,

we must have ANOVA’s Sig. F < 0.050 Then each variable’s coefficient (b0, b1,

etc.) must have significance < 0.050 Otherwise the coefficient might be zero

Then the better regression equations are ranked in order of strength by R Square, which is confirmed visually by plotting


Regression Analysis The standard error of coefficients is given,

so confidence intervals can be formed Also helps report them meaningfully, so you

don’t report a value as 4.861435 if it has a standard error of 0.92

Depending on the accuracy of the source data, you could report that result as 5 +/- 1, or 4.9 +/- 0.9, or 4.86 +/- 0.92


Crosstabs Crosstabs display data sorted by two

or more variables in table form Often just counts of each category,

and/or the percentage of counts Recoding data allows interval or ratio

scale data to be put into groups (e.g. age 18-25)


Pearson’s Chi Square Measures how well the actual (observed)

data differs from a even (expected) distribution of data

The “expected” data can be a random distribution (same number of counts per cell), or adjusted for the actual total counts for each row and column


Pearson’s Chi Square Evaluation When chi square is larger than the critical

value, reject the null hypothesis Or if the significance of chi square is <

0.050, reject the null hypothesis Can also generate Chi square for a single

variable Beware that Chi square is less meaningful

for large matrices Or, it’s too easy for large matrices to show

significance falsely using Chi square


Residuals A residual is the difference between the

Observed and Estimated values for a cell Residuals can be plotted to look for

outliers Residuals can be standardized by dividing

by their standard deviation Cells with a standardized residual magnitude

> 2 contribute a lot to Chi square


Measures of Association Measures of Association between two

variables can be symmetric or directional Dozens of measures have been developed

to work with chi square test Interpret them like ‘r’ - zero means no

correlation, larger values mean a stronger correlation Some can be > 1


Measures of Association Symmetric measures don’t care which

variable is dependent (Y) Directional measures DO care which

variable is dependent (A = f(B) is not B = f(A)) Some directional measures have a

“symmetric” value, the weighted average of the other two


Symmetric Measures The “Contingency Coefficient” is the main

symmetric measure with a Chi Square test Works even with nominal data Evaluated like Pearson’s r

Phi and Cramer’s V are other symmetric measures


Directional Measures Directional measures range from 0 to 1

Lambda is the recommended directional measure - tells what proportion of the dependent variable is predicted by the independent variable (like Eta)

Eta can be applied here if one variable is interval or ratio scaled


Relative Risk and Odds Ratio Use only with 2x2 tables Are quite directional Tells how much more likely one cell is to

occur than the others Need to be very careful when interpreting


Square Tables Tables with the same number of rows and

columns (RxR), and the same variables in those rows and columns, can use kappa Measures strength of association, like ‘r’ Check results for significance (<0.050) Then judge the value of kappa using a

fixed scale


General RxC Measures Many measures can be used with a

general table of R rows and C columns Gamma is the recommended measure

(symmetric) Spearman’s Correlation Coefficient is also

widely used Ranges from -1 to +1, based on ordered

categories


Yule’s Q Yule’s Q is a special case of gamma for a

2x2 table Is judged on a fixed scale, like ‘r’