- Home
- Documents
*Multivariate Statistical Analysis (Math347) makchen/MATH4424/ · PDF fileMultivariate...*

Click here to load reader

View

237Download

1

Embed Size (px)

Multivariate Statistical Analysis (Math347)

Instructor: Kani Chen

An Overview

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 1 / 13

Math347 vs. Math341:

Blue collar vs. white collar.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 2 / 13

Math347 vs. Math341:

Blue collar vs. white collar.

Construction work vs. architecture.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 2 / 13

Math347 vs. Math341:

Blue collar vs. white collar.

Construction work vs. architecture.

Action vs. Drama.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 2 / 13

Math347 vs. Math341:

Blue collar vs. white collar.

Construction work vs. architecture.

Action vs. Drama.

Messy vs. clean.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 2 / 13

Math347 vs. Math341:

Blue collar vs. white collar.

Construction work vs. architecture.

Action vs. Drama.

Messy vs. clean.

Expect lots of sweat throughout this course.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 2 / 13

Structure of the course:

Part 1: Conventional streamline of elementary statistics:about means;linear regression.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 3 / 13

Structure of the course:

Part 1: Conventional streamline of elementary statistics:about means;linear regression.

Part 2: (Special) methodologies.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 3 / 13

Part I. Conventional problems.One sample problem:

X1, ..., Xn iid MN(,),

where Xi and are p-dimension, and is p p.Concerning with . The objective is to estimate with accuracyjustification.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 4 / 13

Part I. Conventional problems.One sample problem:

X1, ..., Xn iid MN(,),

where Xi and are p-dimension, and is p p.Concerning with . The objective is to estimate with accuracyjustification.

Xi is the (IQ, EQ) of the i-th randomly selected student from ouruniversity. p = 2.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 4 / 13

Paired sample problem:

(still one sample problem):(

X1Y1

)

...

(

XnYn

)

iid MN((

1

2

)

,),

where Xi , Yi , 1 and 2 are p-dimension, and is (2p) (2p)matrix.Concerning with the difference of 1 and 2, and objective:Estimating 1 2 with accuracy justification.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 5 / 13

Paired sample problem:

(still one sample problem):(

X1Y1

)

...

(

XnYn

)

iid MN((

1

2

)

,),

where Xi , Yi , 1 and 2 are p-dimension, and is (2p) (2p)matrix.Concerning with the difference of 1 and 2, and objective:Estimating 1 2 with accuracy justification.

(Xi , Yi) are the (IQ, EQ) of the i-th randomly selected student fromour university before and after a training program. (p = 2.) Doesthe training program make a difference?

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 5 / 13

Repeated measurement.

(Exactly one sample, but with a special care.)

X1, ..., Xn iid MN(,),

Concerning the differences of the components of .

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 6 / 13

Repeated measurement.

(Exactly one sample, but with a special care.)

X1, ..., Xn iid MN(,),

Concerning the differences of the components of .

Example: Xi is the returns of year 2006, 2007, 2008 of the i-thrandomly selected stock in HKEX. (p = 3). Is there a differencebetween the three years in terms of stock returns?The components are of the same nature and there are numericallycomparable.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 6 / 13

Two sample problem:

X1, ..., Xn iid MN(1,1),

Y1, ..., Ym iid MN(2,2),

and {Xi} are independent of {Yj}.Concerning the difference of 1 and 2, and the objective is toestimate 1 2 with accuracy justification.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 7 / 13

Two sample problem:

X1, ..., Xn iid MN(1,1),

Y1, ..., Ym iid MN(2,2),

and {Xi} are independent of {Yj}.Concerning the difference of 1 and 2, and the objective is toestimate 1 2 with accuracy justification.

Example: Xi is the monthly (income, spending) of the i-thrandomly selected household in Hong Kong. Yi is that inShanghai. (p = 2). Any difference between HK and SH in termsincome and spending?Special interest: 1 = 2.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 7 / 13

Several/Multiple sample problem:

X11, ..., X1n1 iid MN(1,),

Xg1, ..., Xgng iid MN(g,),

Concerning the differences between 1, 2, ..., g , and theobjective answer the question whether they any different withaccuracy justification.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 8 / 13

Several/Multiple sample problem:

X11, ..., X1n1 iid MN(1,),

Xg1, ..., Xgng iid MN(g,),

Concerning the differences between 1, 2, ..., g , and theobjective answer the question whether they any different withaccuracy justification.

Example: monthly (income, spending) for randomly selectedhouseholds in HK, SH and BJ. (p = 2 and g = 3).Any difference between HK, SH and BJ in terms of income andspending?MANOVA

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 8 / 13

Regression

Ordinary/univariate linear regression:

Y = X + .

where Y is of 1 dimension (uni);: 1 dimension; and X are r + 1 dimension.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 9 / 13

Regression

Ordinary/univariate linear regression:

Y = X + .

where Y is of 1 dimension (uni);: 1 dimension; and X are r + 1 dimension.

Example: Yi is math exam score and Xi is (1, time spent) onstudying for the i-th randomly selected student.Here 1 is for the intercept component.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 9 / 13

Regression

Multivariate linear regression:

Y = X + .

where Y as well as are of m dimension (uni);: (r + 1) m matrix;m univariate linear models putting together.Connected by the dependence of the components of .

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 10 / 13

Regression

Multivariate linear regression:

Y = X + .

where Y as well as are of m dimension (uni);: (r + 1) m matrix;m univariate linear models putting together.Connected by the dependence of the components of .

Example: Yi is (math, physics) exam scores and Xi same asabove.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 10 / 13

Part II:1. Principle component analysis and factor analysis.

Linearly combine the variables in study to produce the mostinfluential (hidden) variables.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 11 / 13

Part II:1. Principle component analysis and factor analysis.

Linearly combine the variables in study to produce the mostinfluential (hidden) variables.

Factor analysis, a refined method.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 11 / 13

Part II:1. Principle component analysis and factor analysis.

Linearly combine the variables in study to produce the mostinfluential (hidden) variables.

Factor analysis, a refined method.

Example: We have the exam scores of history, English, Chinese,math, physics and chemistry for a number of randomly selectedstudents. What are the most important factors that contribute tothe exam performance?

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 11 / 13

Part II:1. Principle component analysis and factor analysis.

Linearly combine the variables in study to produce the mostinfluential (hidden) variables.

Factor analysis, a refined method.

Example: We have the exam scores of history, English, Chinese,math, physics and chemistry for a number of randomly selectedstudents. What are the most important factors that contribute tothe exam performance?

Example: With the observations of the daily returns of PetroChina,Sinopec, CNOOC, Tencent, Kingsoft and Alibaba in the past year,can we find out what are the underlying driving forces that affectthe stock prices?

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 11 / 13

2. Canonical Analysis.

Find out and sort out the connection between two sets of variablesusing linear combinations.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 12 / 13

2. Canonical Analysis.

Find out and sort out the connection between two sets of variablesusing linear combinations.

Example: With the observations of the daily returns of PetroChina,Sinopec, CNOOC, Tencent, Kingsoft and Alibaba in the past year,can we find out and explain the relation, if any, between the stockperformances the oil industry and the internet industry?

Instructor: Kani Chen (HKUST) Multivariate