Bivariate Data Analysis

Bivariate Data Analysis Bivariate Data analysis 4

If the relationship is linear the residuals plotted against the original x - values would be

scattered randomly above and below the line.

A scatter plot of residuals versus the x-values should be boring and have no interesting features, like

direction or shape. It should stretch horizontally with about

the same amount of scatter throughout. It should show no

curves or outliers

r = 0.87 indicates a strong linear relationship between x and y

The scatter plot below however shows the relationship is clearly non-linear

When examining residuals to check whether a linear model is appropriate, it is usually best to

plot them.

The variation in the residuals is the key to assessing how well the

model fits.

The pattern of residuals looks more like a parabola. This should indicate that the data were not really linear, but were

more likely to be quadratic.

Discuss this data.

Discuss this situation.

Outlier?

Discuss the plot of the residuals

Discuss this scatter plot

Linear?

Residuals

Useful website

• http://stat-www.berkeley.edu/~stark/Java/Correlation.htm plots residuals, regression lines etc

Many of our tools for displaying and summarizing data work only

when the data meet certain conditions.

We cannot use a linear model unless the relationship between two variables is linear.

Often re-expression can save the day, straightening bent relationships so that we can fit and use a simple linear model.

Displays of the residuals can often help you find subsets in the

When a scatterplot shows a CURVED form that consistently increases or decreases, we can often straighten the form of the

plot be re-expressing one or both of the variables.

The correlation is 0.979. That sounds pretty high, but the scatter plot shows something is not quite right.

Re-expressing f/stop speed by squaring straightens the plot.

This plot looks ‘straight’. The correlation is now 0.998, but the increase in correlation is not

important. (The original value of 0.979 is already large.) What is

important is the form of the plot is now straight, so the

correlation is now an appropriate measure of association.

Goals of re-expression

• Make the distribution (as seen in its histogram, for example) more symmetric.

• Make the form of the scatter plot more nearly linear.

• Make the scatter in a scatter plot spread out evenly rather than following a fan shape.

Some hints

• Try y2 for unimodal skewed to the left.• Try square root of y for counted data.• Try logs for measurements that can’t be negative

and especially when they grow by percentage increases.

• Try -1/y or -1/(square root of y).• Logs straighten exponential trends and pull in a

long right trail.• Logs straighten power curves.

Try y versus x2

Try log or 1/x

Don’t stray too far from the powers suggested. Taking a high power may

artificially inflate R2, but it won’t give a useful or meaningful model. It is better to stick with powers between 2 and -2. Even

in that range you should prefer the simpler powers in the ladder to those in the cracks. A square root is easier to

understand than the 0.413 power.

Comparing histograms and scatter graphs

The data in the scatter plot below shows the progression of the fastest times for the men’s marathon since the Second World War.

We may want to use this data to predict the fastest time at 1 January 2010 (i.e. 64 years after 1 January 1946).

Possible solutions

• a quadratic (y = ax2 + bx + c)• an exponential function (y = aebx)• a power function (y = axb)• 2 separate straight lines – one for say 0 – 23 years and one for say 23 – 60 years• a line for only the later years, say

23 – 60 years

Quadratic

• Curve seems to fit• R2 = 0.9592 is very

high• Inappropriate to quote

r as it is not linear

• time starts increasing (not sensible)

Exponential

• Doesn’t fit the data points particularly well

Power Function

• reasonable fit, • R2 is high • R2 = 0.9401

Line for only the later years (1969-2003)

• Line (1969-2003) – reasonable fit, • R2 is high• Note: We only use the later years line for

the prediction and ignore the earlier years

The data in the scatter plot below comes from a random sample of 60 models of new cars taken from all models on the market in New Zealand in May 2000. We want to

use the engine size to predict the weight of a car.

• Seems to be linear for engine sizes less than 2500cc.

• Very weak or no linear relationship for engine sizes over 2500cc.

• Solution: Fit a line for engine sizes less than 2500cc.

Bivariate Data Analysis

Documents

Bayesian Data Analysis with the Bivariate Hierarchical ... · Bayesian Data Analysis with the Bivariate Hierarchical Ornstein-Uhlenbeck Process Model Zita Oravecz, Francis Tuerlinckx

Correlation Analysis Bivariate Analysiswolfpack.hnu.ac.kr/lecture/Fall2007/통계시뮬레이션... · 2011. 7. 31. · Correlation Analysis 실습 Bivariate Analysis `데이터 PCS.xls

1.11 Bivariate Data

CHAPTER 14, QUANTITATIVE DATA ANALYSIS. Chapter Outline Quantification of Data Univariate Analysis Subgroup Comparisons Bivariate Analysis Introduction

Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics

Bivariate Analysis Final

LIS 570 Summarising and presenting data - Univariate analysis continued Bivariate analysis

CHAPTER 11 Univariate and Bivariate Analysis of Data

Bivariate Data Analysis using Linear Regression · Bivariate Data Analysis using Linear Regression 1. Open Genstat 2. Open the file metacarpal 3. You should get this menu ... clicking

Univariate & bivariate analysis

Chapter 5 Summarizing Bivariate Data · 5.1 Bivariate Relationships What is Bivariate data? When exploring/describing a bivariate (x,y) relationship: Determine the Explanatory and

Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

Bivariate Data

WEEK 9: BIVARIATE ANALYSIS II

Bivariate Data Analysis Bivariate Data analysis 4

Bivariate Analysis Instructions - University of Otago · Bivariate Analysis Instructions ... Bivariate Data Analysis using Linear Regression and Genstat ... The file trees has the

Descriptive Statistics and Exploratory Data Analysis - Bivariate

03 - Bivariate Analysis - Ordinal

bivariate EDA and regression analysis

Bivariate Data Analysis - Mr Plant's Maths Pages · 2015-01-02 · Bivariate Data Analysis This is adapted from University of Auckland Statistics Department material. The original