Bivariate Data
When two variables are measured on a single experimental unit, the resulting data are called bivariate data.
You can describe each variable individually, and you can also explore the relationship between the two variables.
Chapter 3: Describing Bivariate Data
Graphs for Qualitative Variables When at least one of the variables is
qualitative, you can use comparative pie charts or bar charts..
Variable #1 =
Variable #2 = Do you think that men and women are
treated equally in the workplace?
Do you think that men and women are treated equally in the workplace?
Opinion
Gender
Comparative Bar Charts
Stacked Bar Chart Side-by-Side Bar Chart
Describe the relationship between opinion and gender:
More women than men feel that they are not treated equally in the workplace.
More women than men feel that they are not treated equally in the workplace.
Two Quantitative VariablesWhen both of the variables are quantitative, call one variable x and the other y. A single measurement is a pair of numbers (x, y) that can be plotted using a two-dimensional graph called a scatterplot..
y
x
(2, 5)
x = 2
y = 5
Describing the Scatterplot
Positive linear - strong Negative linear -weak
Curvilinear No relationship
The Correlation Coefficient Assume that the two variables x and y exhibit a
linear pattern or form. The strength and direction of the relationship
between x and y are measured using the correlation coefficient, r..
where
sx = standard deviation of the x’s
sy = standard deviation of the y’s
sx = standard deviation of the x’s
sy = standard deviation of the y’s
yx
xy
ss
sr
yx
xy
ss
sr
1
))((
nn
yxyx
s
iiii
xy 1
))((
nn
yxyx
s
iiii
xy
# of transistors in a CPU and its integer performance.
Example
CPU Model 1 2 3 4 5
x (million transistors) 14 15 17 19 16
y (SPECint) 178 230 240 275 200
•The scatterplot indicates a positive linear relationship.
•The scatterplot indicates a positive linear relationship.
Examplex y xy
14 178 2492
15 230 3450
17 240 4080
19 275 5225
16 200 3200
81 1123 18447
360.37 6.224
924.1 2.16
Calculate
y
x
sy
sx
1
))((
nnyx
yxs
iiii
xy
6.634
5)1123)(81(
18447
yx
xy
ss
sr
885.)36.37(924.1
6.63
Interpreting r
All points fall exactly on a straight line.
Strong relationship; either positive or negative
Weak relationship; random scatter of points
AppletApplet
-1 r 1
r 0
r 1 or –1
r = 1 or –1
Sign of r indicates direction of the linear relationship.
The Regression Line Sometimes x and y are related in a particular way
—the value of y depends on the value of x.• y = dependent variable• x = independent variable
The form of the linear relationship between x and y can be described by fitting a line as best we can through the points. This is the regression line, y = a + bx..• a = y-intercept of the line• b = slope of the line
AppletApplet
The Regression Line To find the slope and y-intercept
of the best fitting line, use:
xbya
s
srb
x
y
xbya
s
srb
x
y
• The least squares
regression line is y = a + bx
x
y
s
srb
xby a
Examplex y xy
14 178 2492
15 230 3450
17 240 4080
19 275 5225
16 200 3200
81 1123 18447
189.179235.1
3604.37)885(.
86.53)2.16(189.176.224
xy 189.1786.53 :Line Regression
885.
3604.376.224
9235.12.16
r
sy
sx
y
x
RecallFrom Previous Example:
Predict:
Example
xy 189.1786.53
Predict the CPU integer performance of a CPU containing 16 million transistors.
)16(189.1786.53 221.16
Nonlinear Regression Not all relationships between two variables are
linear need to fit some other type of function Nonlinear regression deals with relationships
that are NOT linear. For example, polynomial logarithmic and exponential reciprocal
We can use the method of least squares if we can transform the data to make the relationship appear linear (linearization)
When To Use Nonlinear Regression?
Often requires a lot of mathematical intuition Always draw a scatterplot
if the plot looks non-linear, try nonlinear regression
If a nonlinear relationship is suspected based on theoretical information
Relationship must be convertible to a linear form
Types ofCurvilinear Regression
There are many possible types of nonlinear relationships that can be linearized:
Many other forms can be transformed!
y bx ay a b
x
axy b elog( ) y a bx
Transforming to Linear Forms
Example: if the relation between y and x is exponential (i.e., y = x ), we take the logarithms of both sides of the equation to get: log y = log + x ( log
Note that and are constants.
We can perform similar transformations for reciprocal and power functions
Examples:
ln
ln
y x
X x
Y y
Y X
/
1/
y x
X x
Y y
Y X
exp( ) e
ln
ln
xy x
Y y
X x
Y X
ln
ln
ln
y x
Y y
X x
Y X
Review of Logarithmic Functions
The inverse of the exponential function is the natural logarithm function
Ln(exp(x)) = x
Product Rule for Logarithms Ln(a b) = Ln(a) + Ln(b)
Logb x = Ln(x) / Ln(b) (Change of Base) Loge(x) = Ln(x) / Ln(e) = Ln(x)
Log10(x) = Ln(x) / Ln(10)
Key ConceptsI. Bivariate Data
1. Both qualitative and quantitative variables
2. Describing each variable separately3. Describing the relationship between the variables
II. Describing Two Qualitative Variables1. Side-by-Side pie charts2. Comparative line charts3. Comparative bar charts Side-by-Side Stacked
4. Relative frequencies to describe the relationship between the two variables.
Key ConceptsIII. Describing Two Quantitative Variables
1. Scatterplots Linear or nonlinear pattern
Strength of relationship
Unusual observations; clusters and outliers
2. Covariance and correlation coefficient
3. The best fitting line Calculating the slope and y-intercept
Graphing the line
Using the line for prediction