Upload
marvin-powell
View
212
Download
0
Embed Size (px)
Citation preview
Scatter Diagrams and Linear Correlation• Chapter 1-3 single variable data • Examples or two variables: age of person vs. time to master cell phone task ,
grade point average vs. time studying, grade point average vs. time playing video games, amount of smoking vs. rate of lung cancer
• Scatter diagram: (x,y) data plotted as individual points– x – explanatory variable (independent)– y – response variable (dependent)
• Evaluate scatterplot data– y vs x values – shows relationship between 2
quantitative variables measured on the same individual
Scatter Diagrams and Linear Correlation• Look at overall pattern
– Any striking deviation (outliers)?
• Describe by a) form (linear or curved) b) direction - positively associated +slope negatively associated – slope c) strength - how closely do points follow form
• Examples: age of person vs. time to master cell phone task , grade point average vs. time studying, grade point average vs. time playing video games, amount of smoking vs. rate of lung cancer
Degrees of correlation
Scatter Diagrams and Linear Correlation• Tips for drawing
scatterplot– Scale axis: intervals for
each axis must be the same; scale can be different for each axis
– Label both axis– Adopt a scale that uses
entire grid (do not compress plot into 1 corner of grid
Scatter Diagrams and Linear Correlation• Correlation coefficient (r)
– Assesses strength and direction of linear relationship between x and y.
– Unit less– -1≤ r ≤ 1 r = -1 or 1 perfect correlation (all
points exactly on the line)– Closer to 1or -1; better line describes relationship;
better fit of data – r > 0 positive association at x, y – r < 0 negative association a x , y – x and y are interchangeable in calculating r– r does not change if either (or both) variables have unit
changes (inches to cm, or F to C)
Linear and non-linear correlations
Scatter Diagrams and Linear Correlation
• r = 1 Σ( x-x . y-y_) n-1 sx sy
• Using TI-83 ex p.129 (number of police vs. muggings)• Cautions : Association does not imply causation
– Lurking variables may play rate
– r only good for linear models
– Correlation between averages higher than between individual point.
Scatter Diagrams and Linear Correlation
• Facts– No distinction between x and y variable. The
value of r is unaffected by switching x and y – Both x and y must be quantitative– Only good for linear relationships– Not resistant to outliers
• Correlation or r is not a complete description of 2-variable data, the x and y standard deviations and means should be included
• HW: p131 2,4,6,8 a,b,c, 10 a,b,c, 12 a,b,cFor “c” use calculator to compute r
4.2 Least Squares Regression
• Least Squares Regression– Method for finding a line (best fit) that
summarizes the relationship between 2 variables a x (explanatory) and y (response)
– Use the line to predict value of y for a given x– Must have specific response variable y and
explanatory variable x (cannot switch like r)
4.2 Least Squares Regression
• Least Squares Regression Line (LSRL) – Minimizes square of error (y-values) – Error = observed –predicted value
Σ(y-ŷ)2 (y actual value, ŷ is predicted value) (ŷ is called y hat)
– Line of y on x that makes the sum of the squares of data points to fitted line as small as possible
4.2 Least Squares Regression
• LSRL Equation ŷ = a + bx• ŷ predicted value of y
• Slope b = r(sy/sx)
• y – intercept a = y – bx
• x and y are means for all x and y data, respectively and are on the LSLR (x, y)
• sy sx are std. deviations of x,y data
• r correlation
• ŷ predicted value of y
4.2 Least Squares Regression
• TI-83 – enter data into L1, L2 (x,y)– Use STAT CALC , select #8:LinReg(a+bx) to
get the best fit required
• Slope: important for interpretation of data– Rate of change of y for each increase of x
• Intercept – may not be practically important for problems.
4.2 Least Squares Regression
• Plot LSLR: using formula ŷ = a + bx find 2 values on the line.– (x1, ŷ1) and (x2, ŷ2) make sure x1 and x2 are
near opposite ends of the data
• Influential observations and outliers– Influential – extreme in the x-direction
if we remove an influential point it will affect the LSLR significantly
– Outliers – extreme in the y-direction does not significantly change the LSLR
Coefficient of Determination
• r2 – coefficient of determination
• r – describes the strength and direction of a straight line relationship
• r2 - fraction of variation in values of y that is explained by LSRL of y on x
• r = 1, r2 = 1 perfect correlation 100% of the variation explained by LSRL
• r = 0.7, r2 = 0.49 about 49% of y is explained by LSLR
Residuals • Residuals – difference between observed value
and predicted value– Residual = y –ŷ– Mean of least square residuals = 0
• Residual plots – scatterplot of regression residuals against explanatory variable (x)– Useful in accessing fit of regression line i.e. do we have
a straight line?
• Linear –uniform scatter• Curved indicates relationship not linear• Increasing/ decreasing indicates predicting of y
will be less accurate for larger x