Upload
valerie-miles
View
221
Download
0
Embed Size (px)
Citation preview
Regression
Review Regression and Pearson’s RSPSS Demo
The Regression line
• Properties:1. The sum of positive and negative vertical
distances from it is zero2. The standard deviation of the points from the
line is at a minimum3. The line passes through the point (mean x,
mean y)• Bivariate Regression Applet
Regression Line Formula
Y = a + bXY = score on the dependent variableX = the score on the independent variablea = the Y intercept –
point where the regression line crosses the Y axis
b = the slope of the regression line– SLOPE – the amount of change produced in Y by a unit change
in X; or,– a measure of the effect of the X variable on the Y
Regression Line FormulaY = a + bX
y-intercept (a) = 102 slope (b) = .9
Y = 102 + (.9)X
• This information can be used to predict weight from height.
• Example: What is the predicted weight of a male who is 70” tall (5’10”)?– Y = 102 + (.9)(70) = 102 + 63 = 165 pounds
height (inches)
807570656055504540
wei
ght
(pou
nds)
260
240
220
200
180
160
140
120
100
The Slope (b) – A Strength & A Weakness
– We know that b indicates the change in Y for a unit change in X, but b is not really a good measure of strength
– Weakness– It is unbounded (can be >1 or <-1) making it hard to interpret
• The size of b is influenced by the scale that each variable is measured on
Pearson’s r Correlation Coefficient
• By contrast, Pearson’s r is bounded – a value of 0.0 indicates no linear relationship and
a value of +/-1.00 indicates a perfect linear relationship
Pearson’s rY = 0.7 + .99x
sx = 1.51
sy = 2.24
• Converting the slope to a Pearson’s r correlation coefficient:
– Formula: r = b(sx/sy)
r = .99 (1.51/2.24)r = .67
Coefficient of Determination
• Conceptually, the formula for r2 is: r2 = Explained variation Total variation
“The proportion of the total variation in Y that is attributable or explained by X.”
• The variation not explained by r2 is called the unexplained variation
– Usually attributed to measurement error, random chance, or some combination of other variables
Coefficient of Determination
– Interpreting the meaning of the coefficient of determination in the example:
• Squaring Pearson’s r (.67) gives us an r2 of .45
• Interpretation:– The # of hours of daily TV watching (X) explains 45% of the
total variation in soda consumed (Y)
Another Example: Relationship between Mobility Rate (x) & Divorce rate (y)
• The formula for this regression line is:Y = -2.5 + (.17)X– 1) What is this slope telling you?– 2) Using this formula, if the
mobility rate for a given state was 45, what would you predict the divorce rate to be?
– 3) The standard deviation (s) for x=6.57 & the s for y=1.29. Use this info to calculate Pearson’s r. How would you interpret this correlation?
– 4) Calculate & interpret the coefficient of determination (r2)
Mobility Rate
6050403020100
Div
orce
Rat
e
8
7
6
5
4
3
2
1
0
-1
-2
-3
Another Example: Relationship between Mobility Rate (x) & Divorce rate (y)
• The formula for this regression line is:Y = -2.5 + (.17)X– 1) What is this slope telling you?
• For every one unit increase in x (mobility rate), divorce rate (y) goes up .17– 2) Using this formula, if the mobility rate for a given state was 45,
what would you predict the divorce rate to be?• Y = -2.5 + (.17) 45 = 5.15
– 3) The standard deviation (s) for x=6.57 & the s for y=1.29. Use this info to calculate Pearson’s r. How would you interpret this correlation?
• r = .17 (6.57/1.29) = .17(5.093) = .866– There is a strong positive association between mobility rate & divorce rate.
– 4) Calculate & interpret the coefficient of determination (r2)• r2 = (.866)2 = .75
– A state’s mobility rate explains 75% of the variation in its divorce rate.
PEARSON’S r IN SPSS
– Steps for running Pearson’s r in SPSS:1. Click Analyze Correlate Bivariate2. Highlight the 2(+) variables you wish to examine3. Click OK
Pearson’s r Output
• Note that the table reports each correlation coefficient twice– (3 bivariate relationships, 6 correlation coefficients reported)
• Example interpretation:– There is a weak to moderate negative relationship (r = -.260) between age at
which one’s first child is born (AGEKDBRN) and the number of children one has (CHILDS).
Correlations
1 -.260** -.119*
. .000 .040
1035 1015 297
-.260** 1 .276**
.000 . .000
1015 1438 442
-.119* .276** 1
.040 .000 .
297 442 447
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
AGEKDBRN R'S AGEWHEN 1ST CHILD BORN
CHILDS NUMBER OFCHILDREN
CHLDIDEL IDEALNUMBER OF CHILDREN
AGEKDBRN R'S AGE
WHEN 1STCHILD BORN
CHILDS NUMBER OFCHILDREN
CHLDIDEL IDEAL
NUMBER OFCHILDREN
Correlation is significant at the 0.01 level (2-tailed).**.
Correlation is significant at the 0.05 level (2-tailed).*.
Measures of Association
* But, has an upper limit of 1 when dealing with a 2x2 table.
Level of Measurement
(both variables)
Measures of Association “Bounded”?
PRE interpretation?
NOMINAL PhiCramer’s VLambda
NO*YESYES
NONOYES
ORDINAL Gamma YES YES
INTERVAL-RATIO
b (slope)Pearson’s rr2
NOYESYES
NONOYES
Significance Testing for 2 IR Variables
• When both variables are interval-ratio level, strength and association are tested together– The slope or “r” will have a “sig value”
• Sig = the specific odds of this slope, assuming the null is correct
• The Null in this case is that there is no relationship between the two variables in the population
– In other words, that the slope (or “r”) in the population is zero– What are the odds of getting this slope, if in the population,
the slope is zero?
SPSS DEMO
• Are individuals with stronger moral values more likely to engage in criminal activity?– Sample = 484 UMD Students– Null hypothesis?
• How to test null? • Both are I/R variables (or close enough)
– Can test the significance of the measure of strength– E.g., is the slope/correlation significantly different from zero?
Or, what are the odds of finding this slope/correlation, if in the population, the slope is zero.
• Is there a relationship here?
• If so, what direction? • What (ballpark) would
the constant, or “y-intercept” be in the regression equation?
“Model” Regression Output
• The same generic output is given for all regression models– The “model” stuff is relevant for models with
more than one independent variable• How do all the independent variables together predict
the dependent variable?
– For our purposes, the “Model R” will have the same values as pearson’s r
• HOWEVER, the model R cannot tell you direction (compute separate correlation in SPSS)