Grand Alexandra 1 Analysis Thema 9 / Analysis Grand Alexandra

Grand Alexandra 1

Analysis

Thema 9 / Analysis G ra n d A l exa n d ra

Grand Alexandra 2

Analysis

1. Data Preparation organizing the data

2. Descriptive Statistics describing the data

3. Inferential Statisticstesting hypotheses and models

Grand Alexandra 3

Conclusion Validity

Conclusion Validity Internal Validity

Is there a relationship between two variables (between cause and effect)?

Assuming that there is a relationship in this study, is the relationship a causal one?

„third variable“?

there is no relationship

there is a relationship

Is the conclusion about the relationship reasonable?

conclusion

„Je mehr Fernseher vorhanden sind, desto schlechter wird die PISA-Leistung.“ (Presse, 10.12.2010. PISA-Sieger: Weiblich und ohne TV)

Grand Alexandra 4

Threats to conclusion validity

Incorrect conclusion about a relationship in the observation

1. conclude that there is no relationship when in fact there is „missing the needle in the haystack“

„noise“ – factors that make it hard to see the relationship

„signal“ – relationship you are trying to see

signal-to-noise ratio problem

2. conclude that there is a relationship when in fact there is not „seeing things that aren´t there“

Grand Alexandra 5

Threats to conclusion validity

threats:• low reliability of measures• low reliability of treatment implementation• random irrelevancies in the setting• random heterogeneity of respondents• -> low statistical power

• violation of assumptions of statistical tests

conclusion reality

no relationship relationship

conclusion reality

relationship no relationship

threats: • fishing and the error rate problem• violation of assumptions of statistical tests

„Finding a relationship when there is not one“

„Finding no relationship when there is one“

„noise“ producing factors add variability

Grand Alexandra 6

Improving Conclusion Validity

• good statistical power (should be > 0.8)

power = „the odds of saying that there is an relationship, when in fact there is one“

• good reliability -> reduce „noise“

• good implementation

Factors that affect power:

sample size: use lager sample size

effect size: increase effect size (e.g. increase the dosage of the program)

signal -> increasenoise -> decrease

α-level: raise the alpha-level

Grand Alexandra 7

Statistical Inference Decision Matrix

decision right1-α (e.g. 0.95)

confidence level

decision wrong

β (e.g. 0.20)β-error (Type II Error)

decision wrong

α (e.g. 0.05)α-error (Type I Error)

significance level

decision right1-β (e.g. 0.80)

Power

H0 is true HA is true

accept H0

accept HA

REALITY

• two mutually exclusive hypotheses (H0, HA)• decision: which hypothesis to accept and which to reject

CON

CLU

SIO

N

Grand Alexandra 8

Statistical Inference Decision

60 80 100 120 140

0.00

0.01

0.02

0.03

0.04

x

dnor

m(x

, mea

n =

100,

sd

= 10

)

60 80 100 120 140

0.00

0.01

0.02

0.03

0.04

x

dnor

m(x

, mea

n =

100,

sd

= 10

)

αβ

H0 right HA right

1-α 1-βPOWER

what we want: high power and low Type I Errorproblem: the higher the power the higher the Type I Error

Grand Alexandra 9

Practical

Ein in „Wirklichkeit“ hochbegabtes Kind wird als nicht hochbegabt diagnostiziert.

Das Ergebnis einer Studie: WU-StudentInnen mit HAK-Abschluss erreichen eine höhere Punkteanzahl bei der MC-Prüfung in Buchhaltung. In Wirklichkeit gibt es aber keinen Unterschied zwischen HAK- und nicht HAK-Absolventen hinsichtlich der erreichten Punkteanzahl.

Um welchen Fehler handelt es sich in diesem Fall? α-Fehler (Fehler 1. Art/ Type I Error) β-Fehler (Fehler 2. Art/ Type II Error)

Um welchen Fehler handelt es sich in diesem Fall? α-Fehler (Fehler 1. Art/ Type I Error) β-Fehler (Fehler 2. Art/ Type II Error)

Grand Alexandra 10

Practical

Durch Erhöhung des α-Fehlers von 0.01 auf 0.05 …

sinkt die Power (Teststärke)

sinken die Chancen einen Fehler 1. Art zu machen

sinken die Chancen einen β-Fehler zu machen

ist der Test restriktiver

Kreuzen Sie die richtige Antwort an und stellen Sie die falschen Antworten richtig.

steigt

steigen

weniger restriktiv

Grand Alexandra 11

Analysis

Beispieldatensatz „Arbeitszufriedenheit“ – AZ

Datensatz: AZ.sav

Hinweis: Die Daten wurden zu Illustrationszwecken aus einem Datensatz* willkürlich gewählt! Etwaige Ergebnisse sollten daher nicht allzu ernst genommen werden.

Stichprobengröße: n = 15

Variablen: dichotom: SEX, Items zu den Konstrukten Arbeitszufriedenheit** (AZ_... ), Betriebsklima** (BK_...), Arbeitsbelastung** (AB_... )

ordinal: POSITION (Position im Betrieb)

metrisch: MITARB (Anzahl der Mitarbeiter), NETTO (monatl. Nettoverdienst in €)

neue Variable: AZ „Arbeitszufriedenheit“(Annahme: intervallskaliert!) Summenscore der einzelnen Variablen AZ_...

* Böhnisch, B., Grand, A., Rechberger, R., Wimmer, W. (2006). Berufliche Zufriedenheit. Seminararbeit aus Empirische Forschungsmethoden. ** Items wurden übernommen von: Giegler, H. (1985). Rasch-Skalen zur Messung von „Arbeits- und Berufszufriedenheit“, „Betriebsklima“ und „Arbeits- und Berufsbelastung“ auf Seiten der Betroffenen.

Grand Alexandra 12

1. Data Preparation

1. Logging the data

2. (Checking the data for accuracy)

3. Developing a database structure – Codebook (Kodierungsschema)

4. Entering the data into the computer (once only entry or double entry); Checking the data for accuracy

5. Data Transformation

• missing values

• item reversals (example: transform reversal items e.g. BK_2: old value: 1 „agree“, 2 „disagree“ -> new value: 2 „agree“, 1

„disagree“)

• recode variables (example: transform items „AZ_...“, „AB_...“, “BK_...“: old value: 1 „agree“, 2 „disagree“ -> new value: 1 „agree“, 0

„disagree“)

• scale totals (example: generate new variable „AZ“ (Arbeitszufriedenheit)) to get a total score for AZ add across the individual items AZ_...,)

• categories

Grand Alexandra 13

1.Data Preparation - Codebook

SEXMITARB

NETTO

POSITION

ID

AZ_1BK_2AB_3AZ_4BK_5

1 2

21

12

3

The codebook should include: variable name variable description variable format instrument/method of collection date collected respondent or group variable location in database

-

Grand Alexandra 14

1. Data Preparation - Checking data for accuarcy

summarize (e.g. frequency table) and check the data

• are the listed values reasonable? („wild codes“, outlier/Ausreißer)• are there missing values? („missing values“)

„wild code“

outlier/Ausreißer• it acutally is an outlier or• error in data entry

„missing values“• there exist no data or • data weren´t entered

„missing values“

Grand Alexandra 15

2. Descriptive Statistics

Univariate Analysis - Analysis of one variable at a time

Description of a single variable:

• distribution

• central tendency (Lagemaß)

• dispersion (Streuungsmaß)

Bivariate Analysis – Analysis of two variables at a time

Multivariate Analysis – Analysis of multiple variables at a time

Descriptive statistics• „quantitative description in a manageable form“ • describe basic features of the data, provide simple summaries• simple graphics analysis

Grand Alexandra 16

2. Descriptive Statistics - Distribution

Frequency distribution

t a b l e g r a p h

Geschlechtabsolute

Häufigkeiten relative

Häufigkeiten

männlich 8 53%weiblich 7 47%

pie chartbar chartboxplothistogram(stem and leaf diagram)…

• absolute frequencies• relative frequencies

• absolute frequencies• relative frequencies

• Frequency table: Geschlecht

• crosstab

Grand Alexandra 17

2. Descriptive Statistics - Distributiong

r a p

h s

53%

47% männlich

weiblich

untere Position mittlere Position obere Position0

1

2

3

4

5

6

7

Kreisdiagramm - Geschlecht

Histogramm – monatl. NettoverdienstBoxplot – Anzahl der Mitarbeiter

Balkendiagramm - Position

Grand Alexandra 18

2. Descriptive Statistics – Central Tendency

Mean (Mittelwert) Median Modus

• ordinal data• metric data

• metric data • nominal data• ordinal data• metric data

Central Tendencies / LAGEMASSE

„sum of values xi / number n of values“

„center of the sample“ „most frequently occuring value“

• if distribution is approx. normal distributed

•not robust against single extreme values („outliers“)

com

puta

tion

data

adeq

uacy •robust against outliers •robust against outliers

x x~

Grand Alexandra 19

2. Descriptive Statistics – Central Tendency / Practical

Berechnen Sie den Mittelwert, Median und Modus der Variablen SEX, MITARB (Anzahl der Mitarbeiter) und POSITION - Achten Sie dabei auf eine sinnvolle Anwendung!

Hilfestellung: aufsteigende Sortierung der Variablen Mitarbeiter und Position

Grand Alexandra 20

2. Descriptive Statistics – Distribution / Practical_Solution

Variable Mean Median Modus

Mitarbeiter 48.5 18 7

Position - 2 1

Geschlecht - - 1

Grand Alexandra 21

2. Descriptive Statistics - Dispersion

Dispersions/ STREUUNGSMASSE

Variance s² Standard Deviation sRange /

Spannweite

com

puta

tion

„square root of the variance“

„average of the sum of the squared deviations “

„highest value minus lowest value“

metric data metric data ordinal datametric datada

ta

=

Grand Alexandra 22

2. Descriptive Statistics - Dispersion

Dispersions/ STREUUNGSMASSE

Interquartile range IQR

com

puta

tion „difference between third and first quartile“

3. quartile (Q3): 75% of the cases fall below this value1. quartile (Q1): 25% of the cases fall below this valuemedian: 50% of the cases fall above and below this value

metric datadata

adeq

uacy

•robust against outliers

Q1

Q2 = median

Q3

25%25%

25%25%

IQR

min

max

Grand Alexandra 23

2. Descriptive Statistics – Dispersion / Practical_Solution

Variable Variance Standard Deviation

Range (Spannweite)

Min Max

Netto- verdienst 471595.238 686.728 2600 200 2800

Berechnung der Varianz, Standardabweichung und der Spannweite der Variable NETTO (Nettoverdienst): n = 15, mean = 1553,3 ; min = 200, max = 2800

Steps (Variance):

1. compute distance between each value and the mean

2. square each discrepancy

3. sum the squares to get the Sum of Squares (SS) value

4. divide the SS by n - 1

Grand Alexandra 24

Correlation

Correlation„A correlation is a single number that describes the degree of relationship between two variables“

correlation coefficient between -1 < r < 1 the higher the absolute r-value, the stronger the relationship between the variables

• uncorrelated r = 0

• positive correlation r > 0 positive relationship

the higher the x-values the higher the y-values on average

• negative correlation r < 0 negative relationship

the higher the x-values the lower the y-values on average and vice versa

• exact linear correlation r = 1 (positive), r= -1 (negative)

Grand Alexandra 25

Correlation - Example

Variable Mean StDev Variance Sum Min Max Range

Netto-verdienst 1553.33 686.728 471595.238 23300 200 2800 2600

Arbeits-zufried. 5.20 2.178 4.743 78 1 9 8

Example:

Is there a relationship between the variable „Nettoverdienst“ and the variable „Arbeitszufriedenheit“?

Descriptive statistics for „Nettoverdienst“ and „Arbeitszufriedenheit“

If yes, …

1. Which type of relationship?

2. How strong is the relationship?

3. Is the correlation significant?

Grand Alexandra 26

Example - Descriptive Statistics

Boxplot – Arbeitszufriedenheit (AZ) Boxplot – monatl. Nettoverdienst in €

Grand Alexandra 27

Example – 1. Which type of relationship?

Grand Alexandra 28

Example – 2. How strong is the relationship?

Product-Moment-Correlation (Pearson)• variables (x,y) are metric and normal distributed

Calculating the correlation

SPSS-Output: Korrelation AZ/NETTO

Grand Alexandra 29

Example – Q-Q Plot

Q-Q Plot: AZ (Arbeitszufriedenheit) Q-Q Plot: monatl. Nettoverdienst in €

Grand Alexandra 30

Example – 3. Is the correlation significant?

Testing the Significance of a Correlation

Null Hypothesis: r = 0Alternative Hypothesis: r <> 0

Steps:

1. determine the significance level alpha-level

2. compute the degrees of freedom df

3. one-tailed or two-tailed test?

4. look at the critical value

α = 0.05

df = N-2 -> 15- 2 = 13

two-tailed test

Grand Alexandra 31

Example – 3. Is the correlation significant?

Auszug: t-Verteilungen für Produkt-Moment-Korrelationen

correlation is significant: r (0.692) > rcrit (0.514)

SPSS-Output: Korrelation AZ/NETTO

Grand Alexandra 32

Correlation Matrix

• symmetric matrix

• relationships between all possible pairs of variables e.g. between C1,…,C10 45 unique correlations

N*(N-1) / 2

Grand Alexandra 33

Other correlations

• Pearson Product Moment (bivariate normal distribution, variables on interval scale)

• Spearman rank Order Correlation (rho) (two ordinal variables)

• Kendall rank order Correlation (tau) (two ordinal variables)

• Point-Biserial Correlation (one variable is on a continuous interval level and the other is dichotomous)

Grand Alexandra 34

Literatur

Basisliteratur:

Trochim, W. & Donelly, J.: The Research methods Knowledge Base (3rd edition) Atomic Dog Internet WWW page, URL: http://www.socialresearchmethods.net/kb/(version current as of October 20, 2006).

Bortz, J., Döring, N. (2006). Forschungsmethoden und Evaluation. Heidelberg: Springer Verlag.

Hatzinger, R. (2006). Angewandte Statistik mit SPSS. Wien: Facultas.

Hatzinger, R. , Nagel, H. (2009). PASW Statistics. Statistische Methoden und Fallbeispiele. München: Pearson Studium.

Nagel, H. (2003). Empirische Sozialforschung.

Documents

Grand Alexandra 1 Analysis Thema 9 / Analysis Grand Alexandra