36
06/27/22 330 Lecture 2 1 STATS 330: Lecture 2

7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

Embed Size (px)

Citation preview

Page 1: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 1

STATS 330: Lecture 2

Page 2: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 2

Housekeeping matters

• STATS 762– An extra test (details to be provided). An extra

assignment• Class rep – ????

• Office hours– Alan: 10:30 – 12:00 Tuesday and Thursday in Rm

265, Building 303S– Tutors: TBA

• Assignment 1: Due 2 August

Page 3: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 3

Today’s lecture: Exploratory graphics

Aim of the lecture:

To give you a quick overview of the kinds

of graphs that can be helpful in exploring data. Some of this material has been covered in 201/8. (We will discuss the R code used to make the plots in Lecture 4)

Page 4: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 4

Exploratory Graphics: topics

• Exploratory Graphics for a single variable: aim is to show distribution of values– Histograms– Kernel density estimators– Qq plots

• For 2 variables: aim is to show relationships– Both continuous: scatter plots– One of each: side by side boxplots– Both categorical: mosaic plots – see Ch 5

• 3 variables– Pairs plot– Rotating plot– coplots– 3D plots, contour plots

Page 5: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 5

Single variable:Exchange rate data

• The data of interest: daily changes in log(exchange rate) for the US$/Kiwi.– Monthly date from June 1986 to may 2012– Source: Reserve Bank

• Questions:– What is the distribution of the daily changes

in the logged exchange rate?– Is it normal? If not, how is it different?

Page 6: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 6

yt = exchange rate at time t

Difference in logs is log(yt) – log(yt-1) = log(yt/yt-1)

Page 7: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 7

Data AnalysisSuppose we have the data (3374 differences in

the logs), in an R vector, Diff.in.Logs

hist(Diff.in.Logs,nclass=100,freq=FALSE)# add density estimatelines(density(Diff.in.Logs),col="blue", lwd=2)# add fitted normal densityxvec = seq(-0.1, 0.1, length=100)lines(xvec,dnorm(xvec,mean=mean(Diff.in.Logs), sd=sd(Diff.in.Logs)),col="red", lwd=2)

Page 8: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 8

Page 9: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 9

Page 10: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 10

Normal plot> qqnorm(Diff.in.Logs)

Normal data?

No –

QQ plot indicates that the differences have longer tails than normal, since the plotted points are lower than the line for small values and higher for big ones

Page 11: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 11

Two Variables: Rats!Of interest: growth rates of 16 rats

i.e. relationship between weight and time

• Want to explore the relationship graphically.

• Each rat was measured (roughly) every week for 11 weeks

• For weeks 1-6, all rats were on a fixed diet. Diet was changed after week 6.

Page 12: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 12

Two Variables: Rats!Data set rats.df has variables

– rat (1-16)– growth (weight in grams)– day (day since start of study, 11 values, at

approximately weekly intervals– group (litter, one of 3)– change (has values 1 or 2 - diet was changed

after 6 weeks, diet 1 for weeks 1-6, diet 2 for weeks 7-11

Page 13: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 13

Rats: the data> rats.df growth group rat change day1 240 1 1 1 12 250 1 1 1 83 255 1 1 1 154 260 1 1 1 225 262 1 1 1 296 258 1 1 1 367 266 1 1 2 438 266 1 1 2 449 265 1 1 2 5010 272 1 1 2 5711 278 1 1 2 6412 225 1 2 1 112 230 1 2 1 8

... More data

Page 14: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 14

Rats (cont)

• Could plot weight (i.e. the variable growth) versus the variable day:

plot(day,growth)

BUT….

Page 15: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 15

0 10 20 30 40 50 60

30

04

00

50

06

00

day

gro

wth

Page 16: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 16

Criticisms• Can’t tell which points belong to which rat

• Seem to be 2 groups of points

• In actual fact, the rats came from 3 different litters, is this relevant?

• Could do better

Page 17: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 17

More rats: improvements

• Join points representing the same rat with a line

• Use different colours (or different line types e.g. dashed or dotted) for the different litters

• Use a legend

Page 18: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 18

0 10 20 30 40 50 60

30

04

00

50

06

00

Growth rate of rats

day

gro

wth

Litter 1 Litter 2 Litter 3

Page 19: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 19

More improvements• Plot is too cluttered

• Could plot each rat on a different graph – important to use same scales (axes) for each graph

• This leads to the idea of “Trellis graphics”

Page 20: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 20

day

grow

th

200

300

400

500

600

0 20 40 60

rat rat

0 20 40 60

rat rat

rat rat rat

200

300

400

500

600

rat200

300

400

500

600

rat rat rat rat

rat rat

0 20 40 60

rat

200

300

400

500

600

rat

0 20 40 60

Page 21: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 21

day

grow

th

200

300

400

500

600

0 20 40 60

rat.within.groupgroup

rat.within.groupgroup

0 2040 60

rat.within.groupgroup

rat.within.groupgroup

0 20 40 60

rat.within.groupgroup

rat.within.groupgroup

0 20 40 60

rat.within.groupgroup

rat.within.groupgroup

rat.within.groupgroup

rat.within.groupgroup

rat.within.groupgroup

rat.within.groupgroup

rat.within.groupgroup

rat.within.groupgroup

rat.within.groupgroup

200

300

400

500

600

rat.within.groupgroup

200

300

400

500

600

rat.within.groupgroup

rat.within.groupgroup

0 20 4060

rat.within.groupgroup

rat.within.groupgroup

0 2040 60

rat.within.groupgroup

rat.within.groupgroup

0 20 40 60

rat.within.groupgroup

rat.within.groupgroup

0 20 40 60

Page 22: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 22

Two variables: one continuous, one

categorical• Insurance data: data on 14,000 insurance

claims. Want to explore relationship between the amount of the claim (a continuous variable) and the type of car

(a categorical variable).

• Use side-by side boxplots.

Page 23: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 23

11 22 33 44 55 66 77 88 99 1111 1313 1515 1717

46

8

CARGROUP

Log(

AD

INC

UR

)Car Group

Loesssmooth

Page 24: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 24

More than 2 variables:

• If all variables are continuous, we can explore the relationships between them using a pairs plot

• If we have 3 variables, a rotating plot is a very useful tool

Page 25: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 25

Example: Cherry trees

> cherry.df diameter height volume1 8.3 70 10.32 8.6 65 10.33 8.8 63 10.24 10.5 72 16.45 10.7 81 18.86 10.8 83 19.77 11.0 66 15.68 11.0 75 18.29 11.1 80 22.610 11.2 75 19.9

... more data – 31 trees in all

Page 26: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 26

Cherry trees: pairs plots

> pairs(cherry.df)

diameter

65 70 75 80 85

810

1214

1618

20

6570

7580

85

height

8 10 12 14 16 18 20 10 20 30 40 50 60 70

1020

3040

5060

70

volume

Page 27: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

3-d Rotating plots• The challenge: to represent a 3-

dimensional object on a 2-dimensional surface (a TV screen, computer screen etc)

• Traditional method uses projection, perspective

• A powerful idea is to use motion, looking at the 3-d scene from different angles

04/19/23 330 Lecture 2 27

Page 28: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 28

Perspective

Page 29: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 29

Diameter -volume view

Diameter -height view

Volume -height view

Arbitrary view

Projection

Page 30: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 30

Cherry trees: rotating plot

Page 31: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 31

Dynamic motion• By dynamically changing the angle of

view, we get a better impression of the 3-dimensional structure of the data

• “Dynamic graphics” is a very powerful tool

Page 32: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 32

A powerful idea: Coplots

• Coplot shows relationship between x and y for selected values of z (usually a narrow range of z’s)

• By showing separate plots for different z ranges, we can see how the relationship between x and y changes as z changes

• Coplot: conditioning plot, shows relationship between x and y conditional on z (ie for fixed z)

Page 33: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 33

Cherry trees: coplots

• To show the relationship between height and volume for different values of diameter:

• Divide the range of diameter (8.3 to 20.6) up into 6 subranges 8-11, 10.5 -11.5 etc

• Draw 6 plots, the first using all data whose diameter is between 8 and 11, the second using all data whose diameter is between 10.5 and 11.5, and so on

Page 34: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 34

1020

3040

5060

70

65 70 75 80 85

65 70 75 80 85 65 70 75 80 85

1020

3040

5060

70

height

volu

me

10 12 14 16 18

Given : diameter

Page 35: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

04/19/23 330 Lecture 2 35

Interpretation• Note that the lines are not of the same slope• This implies that the point configuration is not

“planar”

rowcolumn

z

-60

-40

-20

0

20

40

Page 36: 7/3/2015330 Lecture 21 STATS 330: Lecture 2. 7/3/2015330 Lecture 22 Housekeeping matters STATS 762 –An extra test (details to be provided). An extra assignment

Other 3-d graphs

04/19/23 330 Lecture 2 36

plot of surface3-d scatter plot

Both can be rotated