Upload
valentine-robertson
View
214
Download
0
Embed Size (px)
Citation preview
Statistics:Analyzing 2 Quantitative
Variables
MIDDLE SCHOOL LEVEL
Session #2 Presented by: Dr. Del Ferster
Immaculata Week 2014July 28—August 1, 2014
Are 2 quantitative variables always related?
If there is a strong trend, can we assume a cause and effect status?
Why is linear regression important?
What does it let us do?
Some questions to get us started
We’re going to spend time today on QUANTITATIVE STATISTICS.
We’ll examine scatter plots and look for patterns and strength of relationships.
What’s in store for today?
We’ll interpret regression lines in the context of the problem.
We’ll look at correlation—a measure of the linear trend of the data.
I’ve also included a “spiffy” activity that I think you can use with your students.
What’s in store for today?
Analyzing Scatterplots
SCATTER PLOTS
Scatterplot
A graphical display of two quantitative variables
We plot the explanatory (independent) variable on the x-axis and the response (dependent) variable on the y-axis
Each dot represents a single observation and its ordered pair (x,y)
Describing Scatterplots
When we consider scatterplots, we focus on 4 things:◦Direction◦Form◦Scatter◦Unusual elements
Direction Positive: as values of the explanatory variable increase, values in the response variable tend to increase
As x gets larger, y gets larger
Direction Negative: as values of the
explanatory variable increase, values in the response variable tend to decrease
As x gets larger, y gets smaller
Direction Null: no discernible patter of change in the response variable
Form (Shape) Linear: The shape has the appearance of a linear relationship.
There doesn’t have to be a perfect fit.
Form
Curved We can use logarithms to transform into linear forms.
Form None No discernible form
Strength (Scatter)
Strong association: very little scatter
Strength Moderate strength:
Strength Weak strength: lots of scatter
Unusual Features Outliers—They just don’t fit the trend
Unusual Features Look for changes in the scatter. A horn shape:
Let’s check some outHow would you describe the following plots?
The scatterplot shows a moderately strong, negative association.
There is a bit of a curve.
#1
The scatterplot shows a weak, positive linear association.
The scatter tends to decrease as the scores in Exam 1 increase.
#2
The scatterplot shows a moderately strong, positive linear association.
There appears to be an outlier around (9, 35).
#3
The scatterplot shows no apparent association.
There is great scatter and an outlier around (60,8).
#4
The scatterplot shows a curved form in which the scatter increases as the explanatory variable increases
#5
Linear RegressionDetermining the LINE that best fits our data.
Regression Line
A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes.
A regression line summarizes the relationship between two variables, but only in a specific setting: when one of the variables helps explain or predict the other.
Regression Line
We often use a regression line to predict the value of y for a given value of x.
Regression, unlike correlation, requires that we have an explanatory variable and a response variable
Regression Line
Fitting a line to data means drawing a line that comes as close as possible to the points.
Extrapolation-the use of a regression line for prediction far outside the range of values of the explanatory variable x that you used to obtain the line.◦Such predictions are often not accurate.
Linear Regression
Regression analysis finds the equation of the line that best describes the relationship between the two variables.
In other words, what line best fits the data that is represented on our scatterplot.
While there are formulas to calculate this line, most of the time we’d use a graphing calculator or app for our ipad.
Least-Squares Regression Line The equation of the least-squares
regression line of y on x is or more simply, the regression line
22
slope: n xy x y
bn x x
y-intercept: a y bx
y a bx
Interpreting our line
The slope, b, is the amount by which y changes when x increases by one unit.
The intercept, a, is the value of y when .0x
LINEAR REGRESSIONLet’s look at an example
It seems that people are living longer these days (I hope so! ), so I’ve done some research to study this trend. According to US Government statistics, the following data represents the life expectancy for an infant born in the given year.
Linear Regression Example
Year of Birth
2001 2002 2003 2004 2005 2006 2007 2008 2009
Life Expectancy(yrs)
77.9 78.2 78.5 79.0 79.2 79.7 80.1 80.2 80.6
Scatter Plot of our Data
More information on our data
Regression Line: 612.46 0.345y x
1. Does there appear to be a linear relationship between year of birth and life expectancy?
2. Based on the context of the problem, interpret the y-intercept of the line.
3. Based on the context of the problem, interpret the slope of the line.
Some Questions to ponder
4. According to your trend line, what is the predicted life expectancy for a baby born in 2012?
5. According to your trend line, what is the predicted life expectancy for a baby born in 2050?
6. Why might you be a bit skeptical about your response to the last question?
Some Questions to ponder
1. Yes, there appears to be a strong linear trend. The regression line has a positive slope, so as x (the year of birth) increases, so does y (the life expectancy)
2. The regression line is The y-intercept means at year 0 an infant’s life expectancy is -612.46 years. NOTE: in the context of this problem this is MEANINGLESS!
Solution to our Example
612.46 0.345y x
3. The regression line is The slope means that every year an infant’s life expectancy increases by 0.345 years.
4. According to the regression line, an infant born in 2012 has a life expectancy of 81.68 years.
5. According to the regression line, an infant born in 2050 has a life expectancy of 94.79 years.
Solution to our Example612.46 0.345y x
6. The year 2050 falls well outside the set of x values (from 2001 to 2009) upon which the regression line is based, so this is most likely EXTRAPOLATION. We’re seeking to use our regression line to predict for an x value that is WELL OUTSIDE the set of data that was used to generate the equation of the line. This isn’t a good statistical practice, so this prediction would be met with a great deal of skepticism.
Solution to our Example
CorrelationA way to measure the strength of a LINEAR trend.
CORRELATION, denoted by r measures the direction and strength of the linear relationship between two quantitative variables.
General Properties It must be between -1 and 1, or (-1≤ r ≤ 1). If r is negative, the relationship is negative. If r = –1, there is a perfect negative linear
relationship (extreme case). If r is positive, the relationship is positive.
Some facts about CORRELATION
General Properties If r = 1, there is a perfect positive linear relationship (extreme case).
If r is 0, there is no linear relationship. r measures the strength of the linear relationship.
If explanatory and response are switched, r remains the same.
r has no units of measurement associated with it
Scale changes do not affect r
Some facts about CORRELATION
Relationships between 2 numeric variables
Examples of extreme cases
r = 1 r = 0 r = -1
Scatterplots with correlation values
r = 0.07
r = -0.768
r = -0.944
r = 0.936
r = 0.496
r = 1
Guess the Correlation Value#1
r = 0.07
r = -0.768
r = -0.944
r = 0.936
r = 0.496
r = 1
Guess the Correlation Value#2
r = 0.07
r = -0.768
r = -0.944
r = 0.936
r = 0.496
r = 1
Guess the Correlation Value#3
r = 0.07
r = -0.768
r = -0.944
r = 0.936
r = 0.496
r = 1
Guess the Correlation Value#4
r = 0.07
r = -0.768
r = -0.944
r = 0.936
r = 0.496
r = 1
Guess the Correlation Value#5
r = 0.07
r = -0.768
r = -0.944
r = 0.936
r = 0.496
r = 1
Guess the Correlation Value#6
Remember: Correlation measures LINEAR TREND only
It is possible for there to be a strong relationship between two variables and still have r ≈ 0.
EXAMPLE
Another look at our previous example
What would you guess for the value of r for this data?
Another look at our previous example (with the regression line added)
What would you guess for the value of r for this data?
r=0.996 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 201076.5
77
77.5
78
78.5
79
79.5
80
80.5
81
Life Expectancy
Year
Life Expectancy
Association does not imply causation Correlation does not imply causation Slope is not correlation A scale change does not change the correlation.
Correlation doesn’t measure the strength of a non-linear relationship.
Summary of Correlation
Time for An ActivityNow, let’s see if we can apply some of those things that we learned today.
If you want a blank copy for future use, or if you want a copy of my answers, just let me know.
You’re more than welcome to have one!!
Answers are available
Questions or comments
Remember, you make a difference in kids’ lives everyday!!
Challenge your students, support them, and share their successes!!
Wrapping it up for today?
Thanks for your attention, participation, and energy. (I know it’s a long time to sit! )
Head out and hit the links! If I can be of help during the school year,
please don’t hesitate to let me know
EMAIL:◦ Here at Immaculata:◦ [email protected]◦ My home email:◦ [email protected]
PHONE: (610) 369-7344 (HOME) (610) 698-7615 (CELL)
The last slide!!!