17
Scatter Diagrams and Linear Correlation Chapter 1-3 single variable data Examples or two variables: age of person vs. time to master cell phone task , grade point average vs. time studying, grade point average vs. time playing video games, amount of smoking vs. rate of lung cancer Scatter diagram: (x,y) data plotted as individual points – x – explanatory variable (independent) – y – response variable (dependent) Evaluate scatterplot data – y vs x values – shows relationship between 2 quantitative variables measured on the same individual

Scatter Diagrams and Linear Correlation Chapter 1-3 single variable data Examples or two variables: age of person vs. time to master cell phone task, grade

Embed Size (px)

Citation preview

Page 1: Scatter Diagrams and Linear Correlation Chapter 1-3 single variable data Examples or two variables: age of person vs. time to master cell phone task, grade

Scatter Diagrams and Linear Correlation• Chapter 1-3 single variable data • Examples or two variables: age of person vs. time to master cell phone task ,

grade point average vs. time studying, grade point average vs. time playing video games, amount of smoking vs. rate of lung cancer

• Scatter diagram: (x,y) data plotted as individual points– x – explanatory variable (independent)– y – response variable (dependent)

• Evaluate scatterplot data– y vs x values – shows relationship between 2

quantitative variables measured on the same individual

Page 2: Scatter Diagrams and Linear Correlation Chapter 1-3 single variable data Examples or two variables: age of person vs. time to master cell phone task, grade
Page 3: Scatter Diagrams and Linear Correlation Chapter 1-3 single variable data Examples or two variables: age of person vs. time to master cell phone task, grade

Scatter Diagrams and Linear Correlation• Look at overall pattern

– Any striking deviation (outliers)?

• Describe by a) form (linear or curved) b) direction - positively associated +slope negatively associated – slope c) strength - how closely do points follow form

• Examples: age of person vs. time to master cell phone task , grade point average vs. time studying, grade point average vs. time playing video games, amount of smoking vs. rate of lung cancer

Page 4: Scatter Diagrams and Linear Correlation Chapter 1-3 single variable data Examples or two variables: age of person vs. time to master cell phone task, grade

Degrees of correlation

Page 5: Scatter Diagrams and Linear Correlation Chapter 1-3 single variable data Examples or two variables: age of person vs. time to master cell phone task, grade

Scatter Diagrams and Linear Correlation• Tips for drawing

scatterplot– Scale axis: intervals for

each axis must be the same; scale can be different for each axis

– Label both axis– Adopt a scale that uses

entire grid (do not compress plot into 1 corner of grid

Page 6: Scatter Diagrams and Linear Correlation Chapter 1-3 single variable data Examples or two variables: age of person vs. time to master cell phone task, grade

Scatter Diagrams and Linear Correlation• Correlation coefficient (r)

– Assesses strength and direction of linear relationship between x and y.

– Unit less– -1≤ r ≤ 1 r = -1 or 1 perfect correlation (all

points exactly on the line)– Closer to 1or -1; better line describes relationship;

better fit of data – r > 0 positive association at x, y – r < 0 negative association a x , y – x and y are interchangeable in calculating r– r does not change if either (or both) variables have unit

changes (inches to cm, or F to C)

Page 7: Scatter Diagrams and Linear Correlation Chapter 1-3 single variable data Examples or two variables: age of person vs. time to master cell phone task, grade

Linear and non-linear correlations

Page 8: Scatter Diagrams and Linear Correlation Chapter 1-3 single variable data Examples or two variables: age of person vs. time to master cell phone task, grade

Scatter Diagrams and Linear Correlation

• r = 1 Σ( x-x . y-y_) n-1 sx sy

• Using TI-83 ex p.129 (number of police vs. muggings)• Cautions : Association does not imply causation

– Lurking variables may play rate

– r only good for linear models

– Correlation between averages higher than between individual point.

Page 9: Scatter Diagrams and Linear Correlation Chapter 1-3 single variable data Examples or two variables: age of person vs. time to master cell phone task, grade

Scatter Diagrams and Linear Correlation

• Facts– No distinction between x and y variable. The

value of r is unaffected by switching x and y – Both x and y must be quantitative– Only good for linear relationships– Not resistant to outliers

• Correlation or r is not a complete description of 2-variable data, the x and y standard deviations and means should be included

• HW: p131 2,4,6,8 a,b,c, 10 a,b,c, 12 a,b,cFor “c” use calculator to compute r

Page 10: Scatter Diagrams and Linear Correlation Chapter 1-3 single variable data Examples or two variables: age of person vs. time to master cell phone task, grade

4.2 Least Squares Regression

• Least Squares Regression– Method for finding a line (best fit) that

summarizes the relationship between 2 variables a x (explanatory) and y (response)

– Use the line to predict value of y for a given x– Must have specific response variable y and

explanatory variable x (cannot switch like r)

Page 11: Scatter Diagrams and Linear Correlation Chapter 1-3 single variable data Examples or two variables: age of person vs. time to master cell phone task, grade

4.2 Least Squares Regression

• Least Squares Regression Line (LSRL) – Minimizes square of error (y-values) – Error = observed –predicted value

Σ(y-ŷ)2 (y actual value, ŷ is predicted value) (ŷ is called y hat)

– Line of y on x that makes the sum of the squares of data points to fitted line as small as possible

Page 12: Scatter Diagrams and Linear Correlation Chapter 1-3 single variable data Examples or two variables: age of person vs. time to master cell phone task, grade

4.2 Least Squares Regression

• LSRL Equation ŷ = a + bx• ŷ predicted value of y

• Slope b = r(sy/sx)

• y – intercept a = y – bx

• x and y are means for all x and y data, respectively and are on the LSLR (x, y)

• sy sx are std. deviations of x,y data

• r correlation

• ŷ predicted value of y

Page 13: Scatter Diagrams and Linear Correlation Chapter 1-3 single variable data Examples or two variables: age of person vs. time to master cell phone task, grade

4.2 Least Squares Regression

• TI-83 – enter data into L1, L2 (x,y)– Use STAT CALC , select #8:LinReg(a+bx) to

get the best fit required

• Slope: important for interpretation of data– Rate of change of y for each increase of x

• Intercept – may not be practically important for problems.

Page 14: Scatter Diagrams and Linear Correlation Chapter 1-3 single variable data Examples or two variables: age of person vs. time to master cell phone task, grade

4.2 Least Squares Regression

• Plot LSLR: using formula ŷ = a + bx find 2 values on the line.– (x1, ŷ1) and (x2, ŷ2) make sure x1 and x2 are

near opposite ends of the data

• Influential observations and outliers– Influential – extreme in the x-direction

if we remove an influential point it will affect the LSLR significantly

– Outliers – extreme in the y-direction does not significantly change the LSLR

Page 15: Scatter Diagrams and Linear Correlation Chapter 1-3 single variable data Examples or two variables: age of person vs. time to master cell phone task, grade

Coefficient of Determination

• r2 – coefficient of determination

• r – describes the strength and direction of a straight line relationship

• r2 - fraction of variation in values of y that is explained by LSRL of y on x

• r = 1, r2 = 1 perfect correlation 100% of the variation explained by LSRL

• r = 0.7, r2 = 0.49 about 49% of y is explained by LSLR

Page 16: Scatter Diagrams and Linear Correlation Chapter 1-3 single variable data Examples or two variables: age of person vs. time to master cell phone task, grade

Residuals • Residuals – difference between observed value

and predicted value– Residual = y –ŷ– Mean of least square residuals = 0

• Residual plots – scatterplot of regression residuals against explanatory variable (x)– Useful in accessing fit of regression line i.e. do we have

a straight line?

• Linear –uniform scatter• Curved indicates relationship not linear• Increasing/ decreasing indicates predicting of y

will be less accurate for larger x

Page 17: Scatter Diagrams and Linear Correlation Chapter 1-3 single variable data Examples or two variables: age of person vs. time to master cell phone task, grade