Upload
hermann-lange
View
233
Download
0
Tags:
Embed Size (px)
Citation preview
Grand Alexandra 1
Analysis
Thema 9 / Analysis G ra n d A l exa n d ra
Grand Alexandra 2
Analysis
1. Data Preparation organizing the data
2. Descriptive Statistics describing the data
3. Inferential Statisticstesting hypotheses and models
Grand Alexandra 3
Conclusion Validity
Conclusion Validity Internal Validity
Is there a relationship between two variables (between cause and effect)?
Assuming that there is a relationship in this study, is the relationship a causal one?
„third variable“?
there is no relationship
there is a relationship
Is the conclusion about the relationship reasonable?
conclusion
„Je mehr Fernseher vorhanden sind, desto schlechter wird die PISA-Leistung.“ (Presse, 10.12.2010. PISA-Sieger: Weiblich und ohne TV)
Grand Alexandra 4
Threats to conclusion validity
Incorrect conclusion about a relationship in the observation
1. conclude that there is no relationship when in fact there is „missing the needle in the haystack“
„noise“ – factors that make it hard to see the relationship
„signal“ – relationship you are trying to see
signal-to-noise ratio problem
2. conclude that there is a relationship when in fact there is not „seeing things that aren´t there“
Grand Alexandra 5
Threats to conclusion validity
threats:• low reliability of measures• low reliability of treatment implementation• random irrelevancies in the setting• random heterogeneity of respondents• -> low statistical power
• violation of assumptions of statistical tests
conclusion reality
no relationship relationship
conclusion reality
relationship no relationship
threats: • fishing and the error rate problem• violation of assumptions of statistical tests
„Finding a relationship when there is not one“
„Finding no relationship when there is one“
„noise“ producing factors add variability
Grand Alexandra 6
Improving Conclusion Validity
• good statistical power (should be > 0.8)
power = „the odds of saying that there is an relationship, when in fact there is one“
• good reliability -> reduce „noise“
• good implementation
Factors that affect power:
sample size: use lager sample size
effect size: increase effect size (e.g. increase the dosage of the program)
signal -> increasenoise -> decrease
α-level: raise the alpha-level
Grand Alexandra 7
Statistical Inference Decision Matrix
decision right1-α (e.g. 0.95)
confidence level
decision wrong
β (e.g. 0.20)β-error (Type II Error)
decision wrong
α (e.g. 0.05)α-error (Type I Error)
significance level
decision right1-β (e.g. 0.80)
Power
H0 is true HA is true
accept H0
accept HA
REALITY
• two mutually exclusive hypotheses (H0, HA)• decision: which hypothesis to accept and which to reject
CON
CLU
SIO
N
Grand Alexandra 8
Statistical Inference Decision
60 80 100 120 140
0.00
0.01
0.02
0.03
0.04
x
dnor
m(x
, mea
n =
100,
sd
= 10
)
60 80 100 120 140
0.00
0.01
0.02
0.03
0.04
x
dnor
m(x
, mea
n =
100,
sd
= 10
)
αβ
H0 right HA right
1-α 1-βPOWER
what we want: high power and low Type I Errorproblem: the higher the power the higher the Type I Error
Grand Alexandra 9
Practical
Ein in „Wirklichkeit“ hochbegabtes Kind wird als nicht hochbegabt diagnostiziert.
Das Ergebnis einer Studie: WU-StudentInnen mit HAK-Abschluss erreichen eine höhere Punkteanzahl bei der MC-Prüfung in Buchhaltung. In Wirklichkeit gibt es aber keinen Unterschied zwischen HAK- und nicht HAK-Absolventen hinsichtlich der erreichten Punkteanzahl.
Um welchen Fehler handelt es sich in diesem Fall? α-Fehler (Fehler 1. Art/ Type I Error) β-Fehler (Fehler 2. Art/ Type II Error)
Um welchen Fehler handelt es sich in diesem Fall? α-Fehler (Fehler 1. Art/ Type I Error) β-Fehler (Fehler 2. Art/ Type II Error)
Grand Alexandra 10
Practical
Durch Erhöhung des α-Fehlers von 0.01 auf 0.05 …
sinkt die Power (Teststärke)
sinken die Chancen einen Fehler 1. Art zu machen
sinken die Chancen einen β-Fehler zu machen
ist der Test restriktiver
Kreuzen Sie die richtige Antwort an und stellen Sie die falschen Antworten richtig.
steigt
steigen
weniger restriktiv
Grand Alexandra 11
Analysis
Beispieldatensatz „Arbeitszufriedenheit“ – AZ
Datensatz: AZ.sav
Hinweis: Die Daten wurden zu Illustrationszwecken aus einem Datensatz* willkürlich gewählt! Etwaige Ergebnisse sollten daher nicht allzu ernst genommen werden.
Stichprobengröße: n = 15
Variablen: dichotom: SEX, Items zu den Konstrukten Arbeitszufriedenheit** (AZ_... ), Betriebsklima** (BK_...), Arbeitsbelastung** (AB_... )
ordinal: POSITION (Position im Betrieb)
metrisch: MITARB (Anzahl der Mitarbeiter), NETTO (monatl. Nettoverdienst in €)
neue Variable: AZ „Arbeitszufriedenheit“(Annahme: intervallskaliert!) Summenscore der einzelnen Variablen AZ_...
* Böhnisch, B., Grand, A., Rechberger, R., Wimmer, W. (2006). Berufliche Zufriedenheit. Seminararbeit aus Empirische Forschungsmethoden. ** Items wurden übernommen von: Giegler, H. (1985). Rasch-Skalen zur Messung von „Arbeits- und Berufszufriedenheit“, „Betriebsklima“ und „Arbeits- und Berufsbelastung“ auf Seiten der Betroffenen.
Grand Alexandra 12
1. Data Preparation
1. Logging the data
2. (Checking the data for accuracy)
3. Developing a database structure – Codebook (Kodierungsschema)
4. Entering the data into the computer (once only entry or double entry); Checking the data for accuracy
5. Data Transformation
• missing values
• item reversals (example: transform reversal items e.g. BK_2: old value: 1 „agree“, 2 „disagree“ -> new value: 2 „agree“, 1
„disagree“)
• recode variables (example: transform items „AZ_...“, „AB_...“, “BK_...“: old value: 1 „agree“, 2 „disagree“ -> new value: 1 „agree“, 0
„disagree“)
• scale totals (example: generate new variable „AZ“ (Arbeitszufriedenheit)) to get a total score for AZ add across the individual items AZ_...,)
• categories
Grand Alexandra 13
1.Data Preparation - Codebook
SEXMITARB
NETTO
POSITION
ID
AZ_1BK_2AB_3AZ_4BK_5
1 2
21
12
3
The codebook should include: variable name variable description variable format instrument/method of collection date collected respondent or group variable location in database
-
Grand Alexandra 14
1. Data Preparation - Checking data for accuarcy
summarize (e.g. frequency table) and check the data
• are the listed values reasonable? („wild codes“, outlier/Ausreißer)• are there missing values? („missing values“)
„wild code“
outlier/Ausreißer• it acutally is an outlier or• error in data entry
„missing values“• there exist no data or • data weren´t entered
„missing values“
Grand Alexandra 15
2. Descriptive Statistics
Univariate Analysis - Analysis of one variable at a time
Description of a single variable:
• distribution
• central tendency (Lagemaß)
• dispersion (Streuungsmaß)
Bivariate Analysis – Analysis of two variables at a time
Multivariate Analysis – Analysis of multiple variables at a time
Descriptive statistics• „quantitative description in a manageable form“ • describe basic features of the data, provide simple summaries• simple graphics analysis
Grand Alexandra 16
2. Descriptive Statistics - Distribution
Frequency distribution
t a b l e g r a p h
Geschlechtabsolute
Häufigkeiten relative
Häufigkeiten
männlich 8 53%weiblich 7 47%
pie chartbar chartboxplothistogram(stem and leaf diagram)…
• absolute frequencies• relative frequencies
• absolute frequencies• relative frequencies
• Frequency table: Geschlecht
• crosstab
Grand Alexandra 17
2. Descriptive Statistics - Distributiong
r a p
h s
53%
47% männlich
weiblich
untere Position mittlere Position obere Position0
1
2
3
4
5
6
7
Kreisdiagramm - Geschlecht
Histogramm – monatl. NettoverdienstBoxplot – Anzahl der Mitarbeiter
Balkendiagramm - Position
Grand Alexandra 18
2. Descriptive Statistics – Central Tendency
Mean (Mittelwert) Median Modus
• ordinal data• metric data
• metric data • nominal data• ordinal data• metric data
Central Tendencies / LAGEMASSE
„sum of values xi / number n of values“
„center of the sample“ „most frequently occuring value“
• if distribution is approx. normal distributed
•not robust against single extreme values („outliers“)
com
puta
tion
data
adeq
uacy •robust against outliers •robust against outliers
x x~
Grand Alexandra 19
2. Descriptive Statistics – Central Tendency / Practical
Berechnen Sie den Mittelwert, Median und Modus der Variablen SEX, MITARB (Anzahl der Mitarbeiter) und POSITION - Achten Sie dabei auf eine sinnvolle Anwendung!
Hilfestellung: aufsteigende Sortierung der Variablen Mitarbeiter und Position
Grand Alexandra 20
2. Descriptive Statistics – Distribution / Practical_Solution
Variable Mean Median Modus
Mitarbeiter 48.5 18 7
Position - 2 1
Geschlecht - - 1
Grand Alexandra 21
2. Descriptive Statistics - Dispersion
Dispersions/ STREUUNGSMASSE
Variance s² Standard Deviation sRange /
Spannweite
com
puta
tion
„square root of the variance“
„average of the sum of the squared deviations “
„highest value minus lowest value“
metric data metric data ordinal datametric datada
ta
=
Grand Alexandra 22
2. Descriptive Statistics - Dispersion
Dispersions/ STREUUNGSMASSE
Interquartile range IQR
com
puta
tion „difference between third and first quartile“
3. quartile (Q3): 75% of the cases fall below this value1. quartile (Q1): 25% of the cases fall below this valuemedian: 50% of the cases fall above and below this value
metric datadata
adeq
uacy
•robust against outliers
Q1
Q2 = median
Q3
25%25%
25%25%
IQR
min
max
Grand Alexandra 23
2. Descriptive Statistics – Dispersion / Practical_Solution
Variable Variance Standard Deviation
Range (Spannweite)
Min Max
Netto- verdienst 471595.238 686.728 2600 200 2800
Berechnung der Varianz, Standardabweichung und der Spannweite der Variable NETTO (Nettoverdienst): n = 15, mean = 1553,3 ; min = 200, max = 2800
Steps (Variance):
1. compute distance between each value and the mean
2. square each discrepancy
3. sum the squares to get the Sum of Squares (SS) value
4. divide the SS by n - 1
Grand Alexandra 24
Correlation
Correlation„A correlation is a single number that describes the degree of relationship between two variables“
correlation coefficient between -1 < r < 1 the higher the absolute r-value, the stronger the relationship between the variables
• uncorrelated r = 0
• positive correlation r > 0 positive relationship
the higher the x-values the higher the y-values on average
• negative correlation r < 0 negative relationship
the higher the x-values the lower the y-values on average and vice versa
• exact linear correlation r = 1 (positive), r= -1 (negative)
Grand Alexandra 25
Correlation - Example
Variable Mean StDev Variance Sum Min Max Range
Netto-verdienst 1553.33 686.728 471595.238 23300 200 2800 2600
Arbeits-zufried. 5.20 2.178 4.743 78 1 9 8
Example:
Is there a relationship between the variable „Nettoverdienst“ and the variable „Arbeitszufriedenheit“?
Descriptive statistics for „Nettoverdienst“ and „Arbeitszufriedenheit“
If yes, …
1. Which type of relationship?
2. How strong is the relationship?
3. Is the correlation significant?
Grand Alexandra 26
Example - Descriptive Statistics
Boxplot – Arbeitszufriedenheit (AZ) Boxplot – monatl. Nettoverdienst in €
Grand Alexandra 27
Example – 1. Which type of relationship?
Grand Alexandra 28
Example – 2. How strong is the relationship?
Product-Moment-Correlation (Pearson)• variables (x,y) are metric and normal distributed
Calculating the correlation
SPSS-Output: Korrelation AZ/NETTO
Grand Alexandra 29
Example – Q-Q Plot
Q-Q Plot: AZ (Arbeitszufriedenheit) Q-Q Plot: monatl. Nettoverdienst in €
Grand Alexandra 30
Example – 3. Is the correlation significant?
Testing the Significance of a Correlation
Null Hypothesis: r = 0Alternative Hypothesis: r <> 0
Steps:
1. determine the significance level alpha-level
2. compute the degrees of freedom df
3. one-tailed or two-tailed test?
4. look at the critical value
α = 0.05
df = N-2 -> 15- 2 = 13
two-tailed test
Grand Alexandra 31
Example – 3. Is the correlation significant?
Auszug: t-Verteilungen für Produkt-Moment-Korrelationen
correlation is significant: r (0.692) > rcrit (0.514)
SPSS-Output: Korrelation AZ/NETTO
Grand Alexandra 32
Correlation Matrix
• symmetric matrix
• relationships between all possible pairs of variables e.g. between C1,…,C10 45 unique correlations
N*(N-1) / 2
Grand Alexandra 33
Other correlations
• Pearson Product Moment (bivariate normal distribution, variables on interval scale)
• Spearman rank Order Correlation (rho) (two ordinal variables)
• Kendall rank order Correlation (tau) (two ordinal variables)
• Point-Biserial Correlation (one variable is on a continuous interval level and the other is dichotomous)
Grand Alexandra 34
Literatur
Basisliteratur:
Trochim, W. & Donelly, J.: The Research methods Knowledge Base (3rd edition) Atomic Dog Internet WWW page, URL: http://www.socialresearchmethods.net/kb/(version current as of October 20, 2006).
Bortz, J., Döring, N. (2006). Forschungsmethoden und Evaluation. Heidelberg: Springer Verlag.
Hatzinger, R. (2006). Angewandte Statistik mit SPSS. Wien: Facultas.
Hatzinger, R. , Nagel, H. (2009). PASW Statistics. Statistische Methoden und Fallbeispiele. München: Pearson Studium.
Nagel, H. (2003). Empirische Sozialforschung.