Upload
others
View
31
Download
0
Embed Size (px)
Citation preview
Multivariate Methods
Mans Thulin
Department of Mathematics, Uppsala University
Multivariate Methods • 22/3 2011
1/14
Basic information
I 10 credit points
I 10 lectures, 3 computer exercises and 2 problem solvingsessions
I The course book is Johnson & Wichern: Applied MultivariateStatistical Analysis, 6th ed, Pearson
I Some reference literature that might be of interest is listed onthe course information hand-out
I The course is (informally) divided into four blocks
2/14
Block 1: Multivariate data
I Can we visualize 16-dimensional data? How?
I How can we handle multivariate random variables?
I What is the multivariate analogue to the normal distribution?
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
0 50 100 150 200 250 300
700
800
900
1000
1100
1200
SO2
Mor
talit
y
●●
●
●
●
0 1 2 3 4 5 6
05
1015
Andrews' Curves
setosaversicolorvirginica
−3 −2 −1 0 1 2 30.
00.
10.
20.
30.
4
x
dnor
m (
x)
3/14
Block 1: Multivariate data
Course goals: In order to pass the course (grade 3) the studentshould...
I have a knowledge of methods of visualizing multivariate datasets
I be familiar with the multivariate normal distribution
We look at ways to describe multivariate data (graphically andnumerically) and study the properties of multivariate distributionsin general and the multivariate normal distribution in particular.
4/14
Block 2: Inference under MND
I Assuming a multivariate normal distribution, how can we testhypotheses?
I Can we perform t-tests and ANOVA?
I How do we know that the data is normal?
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
● ●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
0 2 4 6 8
−2
02
46
8
x[,1]
x[,2
]
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
5/14
Block 2: Inference under MND
Course goals: In order to pass the course (grade 3) the studentshould...
I know how to perform statistical tests of the mean value vectorof a multivariate normal distribution
I know how to perform statistical tests of two or severalpopulations of a multivariate normal distribution
I know methods and techniques for validation of multivariatenormal distribution
We learn how to estimate the parameters of the MND and how toperform multivariate analogues of the t-test, ANOVA and more.
6/14
Block 3: PCA, FA and CCA
I Can we find dependencies within sets of points? Between setsof points?
I Can we use these dependencies to reduce the dimensionalityof the data?
Neg.Temp
0 500 1000 2000 3000
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●●
●
●
● ●
●
●●
●
●●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●●
●
●
● ●
●
●●
●
●●
●●
●●
●
●●
●
●
●
6 7 8 9 10 11 12
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
● ●
●
●●
●
● ●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●●
●
●
● ●
●
●●
●
● ●
●●
●●
●
●●
●
●
●
40 60 80 100 120 140 160
−75
−65
−55
−45
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●●
●
●
● ●
●
●●
●
●●
●●
●●
●
●●
●
●
●
010
0025
00
●●
● ● ●
●
●
●●●
●
●●●
●●
●
●
●
●
●
●● ●
●●
●
●
●
●●● ●
●●
●● ●●
●
●
Manuf
●●
●●●
●
●
●●●
●
●●●
●●
●
●
●
●
●
●●●
●●
●
●
●
●● ●●
● ●
● ●●●
●
●
●●
● ●●
●
●
● ●●
●
●● ●
●●
●
●
●
●
●
●●●
●●
●
●
●
● ●●●
●●
● ●●●
●
●
●●
●● ●
●
●
● ●●
●
●●●
● ●
●
●
●
●
●
●● ●
● ●
●
●
●
● ● ●●
● ●
● ●●●
●
●
●●
● ● ●
●
●
● ●●
●
●●●
●●
●
●
●
●
●
●● ●
●●
●
●
●
●●● ●
● ●
● ●●●
●
●
●
●
●●
●●
●●
●●
●
●
●●
●●
●
●
●●
●●●
●
●●
●●
●
●
●
●●
●
●
●● ●
●
●
●●
●
●●
●●
●●
●●
●
●
●●
●●
●
●
●●
●●●
●
●●
●●
●
●
●
●●
●
●
●● ●
●
●
●
Pop●
●
●●
●●
●●
●●
●
●
● ●
●●
●
●
●●
●●●
●
●●
●●
●
●
●
●●
●
●
●●●
●
●
●●
●
●●
●●
●●
●●
●
●
●●
●●
●
●
●●
●●●
●
● ●
●●
●
●
●
●●
●
●
●●●
●
●
●
010
0025
00
●
●
●●
●●
●●
●●
●
●
●●
●●
●
●
●●
●●●
●
●●
●●
●
●
●
●●
●
●
●●●
●
●
●
68
1012
●
●●
● ●●●
●● ●
●
●
●
●
●●
●●
●
●●
●
● ●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●●●●●●
●● ●
●
●
●
●
●●
●●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●●●
●● ●
●
●
●
●
●●
●●
●
●●
●
●●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
Wind
●
●●
● ●●●
● ●●
●
●
●
●
● ●
●●
●
●●
●
● ●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●●
● ●●●
● ●●
●
●
●
●
●●
●●
●
●●
●
● ●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●●
●●
●●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●● ●
●
●
●
●●
●●
●
●
●
●
●
● ●
●
●
●●
●●
●●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●● ●
●
●
●
●●
●●
●
●
●
●
●
● ●
●
●
●●●
●●
●●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●● ●
●
●
●
●●
● ●
●
●
●
●
●
●●
●
●
●●
●●
●●
●
●
●●
●
●
●
●●
●●
●Precip
1030
50
●
●
●
●
●●●
●
●
●
●●
●●
●
●
●
●
●
● ●
●
●
●●
●●
●●
●
●
●●
●
●
●
●●
●●
●
−75 −70 −65 −60 −55 −50 −45
4080
120
160
●
●
●
●
●
●●●
●
●●●
●
●
●● ●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ●●
●
●●●
●
●
●● ●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
0 500 1000 2000 3000
●
●
●
●
●
● ●●
●
●●●
●
●
●● ●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●●
●
●●●
●
●
●● ●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
10 20 30 40 50 60
●
●
●
●
●
●●●
●
●● ●
●
●
●●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
Days
7/14
Block 3: PCA, FA and CCA
Course goals: In order to pass the course (grade 3) the studentshould...
I be able to use principal component and factor analysis fortypical problems
I be able to use canonical correlation analysis
We learn techniques for reducing problems to lower dimensions andfor studying dependencies between sets of observations.
8/14
Block 4: Classification & cluster analysis
I Using information about different categories, can we classifynew observations as belonging to one of the categories?
I Can we identify clusters of points in the dataset –observations with similar properties?
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−5 0 5 10 15
05
1015
x[,1]
x[,2
]
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●● ●
●
●
●●
●
●
●
●
●
●●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●● ●●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●
●
●
●
● ●
●
●
●●
●●
●●
●
●
●
●●
●
−400 −300 −200 −100 0 100 200
−15
0−
100
−50
050
100
150
CLUSPLOT( votes.diss )
Component 1
Com
pone
nt 2
These two components explain 18.87 % of the point variability.
●
●
●
●
●
●
●
●
9/14
Block 4: Classification & cluster analysis
Course goals: In order to pass the course (grade 3) the studentshould...
I be able to use classification techniques
I be familiar with methods for multivariate cluster analysis
We study old-fashioned and modern classification techniques andlook at different methods for clustering.
10/14
Examination
I Course goal: be able to present mathematical statisticalarguments to others
I Four mandatory homeworksI Bonus problems can give higher gradesI Feedback – possible to hand in more than once
I Oral presentations of clustering methodsI Take-home exam
I Homeworks and oral presentation must be OKI Date for exam?
11/14
Previous course evaluations
I ”The book was not very up-to-date on some topics.”I We’re still using the same book, since we haven’t found a
suitable replacement. More recent development will bediscussed during the lectures.
I ”Some homework problems and a computer exercise aboutclassification and discrimination would be good.”
I This has been added!
I ”The take-home exam was a great idea for a course like this.”I We’ll have a take-home exam this time as well.
12/14
Computer exercises
I Three scheduled computer exercises
I Physical presence at the exercises is not mandatory, butstrongly recommended
I Software: R
I Download it for free from www.r-project.org
I If you’re not familiar with R, take a look at the filer-intro.pdf in the student portal!
I Remember: you can always contact me if you have questionsabout R or other parts of the course!
13/14
Course homepage
I Information, files and more is found at:
I studentportalen.uu.se
14/14