Upload
d-dutta-roy
View
1.478
Download
1
Embed Size (px)
Citation preview
INTRODUCTION TO INTRODUCTION TO MULTIVARIATE MULTIVARIATE
STATISTICSSTATISTICS
Dr. Debdulal Dutta Roy, Ph.D. (Psy.)Dr. Debdulal Dutta Roy, Ph.D. (Psy.)Psychology Research UnitPsychology Research UnitIndian Statistical InstituteIndian Statistical Institute
Kolkata – 700 108Kolkata – 700 108E-mail:[email protected] (o)E-mail:[email protected] (o)
[email protected]@yahoo.comhttp://www.isical.ac.in/~ddroy/abstract.htmlhttp://www.isical.ac.in/~ddroy/abstract.html
The InstituteThe Institute• Indian Statistical Institute (ISI), a unique
institution devoted to the research, teaching and application of statistics, natural sciences and social sciences. Founded by Professor P.C. Mahalanobis in Kolkata on 17th December, 1931, the institute gained the status of an Institution of National Importance by an act of the Indian Parliament in 1959.
• Research in Statistics and related disciplines is the primary activity of the Institute. Teaching activities are undertaken mainly in Kolkata, Delhi and Bangalore.
MYTHSMYTHS
• Statistical treatment of more than 2 variables is multivariate statistics;– No, when more than 2 variables are
interrelated with each other, we can use multivariate statistics.
Myth 2Myth 2
• Purpose of multivariate statistics is to establish correlation among sets of variables.– True. But it’s purpose is not limited in
determining relation among set of variables. It tends to control the effect of some intervening variables on relationship among sets of variables.
Myth 3Myth 3
• Loss of original score – Accepted, if analysis extracts more latent
properties within the variable.
What is MVS ?What is MVS ?
• MVS refers to the set of statistical tools in order to find out pattern of relationship among the set of variables – Independent, dependent and intervening variables.
• The definition suggests that MVS can not be used when the variables are not correlated with each other.
• Therefore, before going for MVS, it is necessary to do correlation among them.
List of Multivariate Statistical ToolsList of Multivariate Statistical Tools
Can we assess their perception, beliefs and attitudes ?
List of Multivariate Statistical ToolsList of Multivariate Statistical Tools
• Determining differences among groups :– ANOVA with Factorial design;– MANOVA;– Discriminant Function Analysis;
• Determining structure of relationship :– Multiple Regression – Canonical Correlation– Principal component analysis;– Correspondence analysis;– Cluster analysis;
List of Multivariate Research List of Multivariate Research Questions onQuestions on
Women and Child development Women and Child development
Multivariate Research Questions for Women & Multivariate Research Questions for Women & Child Development (Difference Perspective)Child Development (Difference Perspective)
• Multiple Regression– What is the predictive strength of poverty, inequality, war,
criminal networks, ruthless demand for cheap labour and commercial sexual exploitation in predicting motivation to human trafficking ?
• Factorial Design :– Does eating habit (DV) of women vary with age, education
and socio-economic status (IV) ?
1. MANOVA– Does food belief of pregnant mother vary with religion ?
• Discriminant function analysis ?– What is the predictive capacity of food attitude questionnaire
to classify students in terms of their mid-day meal taking ?
Multivariate Research Questions for Women Multivariate Research Questions for Women & Child Development (Relation Perspective)& Child Development (Relation Perspective)
• Canonical correlation – Is there any relation between awareness of nutrition
and motivation to follow good food taking habits ?• Principal Component analysis
– What is the meaning of good food taking motivation ?• Correspondence analysis
– Can we map different districts of one state in terms of human trafficking ?
• Cluster analysis – Is it possible to classify states in terms of immunization
?
Some studies on Application of Some studies on Application of Multivariate statisticsMultivariate statistics
Principal Component AnalysisPrincipal Component Analysis
• Principal component analysis is a technique (1) to reduce the number of variables and (2) to detect structure in the relationships between variables, that is to classify variables. Therefore, PCA is applied as a data reduction or structure detection method. In principal component analysis, we seek linear composites of the original variables that display certain desirable properties, namely, scores that exhibit maximal variance, subject to being uncorrelated with previously computed composites.
Study on Principal Component Study on Principal Component AnalysisAnalysis
• Purpose: To determine principal components of school infrastructure perception.
• Assumption: School infrastructure perception encompasses set of 13 variables – perception of students to school infrastructures as Classroom, Drinking Water, Toilet, Blackboard, Teachers, Book, Teaching learning materials, Friends, Games, Cultural Programs, Book bank, Mid-day Meal, and Health Check-up. Most of these infrastructures are available in rural schools. It is assumed that there would be some latent structure in perception of 13 variables.
1 2 3 4 5 6 7 8 9 10 11 12 13
1 Classroom 1
2 Drinking Water 0.25** 1
3 Toilet 0.28** 0.61** 1
4 Black board 0.33** 0.09*0.22*
* 1
5 Teaching 0.38** 0.22**0.27*
* 0.38** 1
6 Book 0.36** 0.11*0.23*
* 0.41** 0.43** 1
7 TLM 0.13** -0.06 0.05 0.20** 0.24** 0.23** 1
8 Friend 0.32** 0.30**0.34*
* 0.21** 0.29** 0.31** 0.24** 1
9 Games 0.08* 0.09* 0.11* 0.14** 0.24** 0.21** 0.33**0.31*
* 1
10Cultural programme 0.11* 0.05 0.07 0.14** 0.34** 0.21** 0.24**
0.27** 0.30** 1
11Book bank 0 0.17**
0.20** -0.02 -0.01 -0.04 0.07
0.18** 0.15** 0.05 1
13Health checkup 0.33** 0.21**
0.29** 0.20** 0.22** 0.11* 0.01
0.25** 0.04 -0.03
0.08*0.19** 1
Significant correlations suggest possible latent structure Significant correlations suggest possible latent structure of relationship signifying latent meaning of relationship signifying latent meaning
Extraction of Factors using PCAExtraction of Factors using PCA
Infrastructures Basic Infrastructure Supportive Infrastructure Activity based Infrastructure
Class room 0.62 0.35 -0.03
Drinking Water 0.06 0.81 -0.03
Toilet 0.2 0.79 0.03
Black board 0.7 0.06 0.1
Teaching 0.63 0.19 0.29
Book 0.68 0.07 0.25
TLM 0.21 -0.09 0.63
Friend 0.24 0.49 0.44
Games 0.07 0.11 0.74
Cultural Programme 0.19 -0.01 0.63
Book bank -0.35 0.47 0.36
Mid-day meal 0.49 -0.02 0.26
Health checkup 0.38 0.46 -0.17
Eigen Value 3.5 1.67 1.34
Limitation of PCALimitation of PCA
• PCA is applicable when variables are measured in terms of Interval and Ratio scales.
• When variables are measured in terms of nominal or categorical scale, Correspondence analysis is useful statistical tool.
Correspondence AnalysisCorrespondence Analysis• Correspondence analysis is an exploratory multivariate
technique that converts frequency table data into graphical displays in which rows and columns are depicted as points. It provides a method for comparing row or column proportions in a two-way or multiway table. CA investigates the magnitude and the substantive nature of association between the row and column categories of cross tabulation rather than to confirm or reject hypothesis about the underlying process.
• These methods were originally developed in France by Jean-Paul Benzerci in the early 1960’s and 1970’s and it has gained importance in the classic text by Greenacre (1984).
• Other names : correspondence mapping, perceptual mapping, social space analysis, correspondence factor analysis, principal components analysis of qualitative data, and dual scaling;
• Types : Simple and Multiple.
Study on CAStudy on CA
• Purpose: To determine correspondence between computer programming tasks and relative use.
• Assumptions : Uses of 14 computer programming tasks vary.
• Data Sets : 14 programming tasks (row variables) and 5 ratings of use (column variables).
INPUT TABLE FOR CAINPUT TABLE FOR CA
Cross Tabulation of 5 Cols. (Rating categories) X 14 Rows (Computer programming tasks).
Assumption: Some tasks are related with each other and some of them are more frequently used and some are used less frequently.
Cluster Analysis Cluster Analysis
• Cluster analysis helps to identify similar entities on the basis of characteristics they possess. It helps to classify objects or variables having functional homogeneity. The resulting object clusters should exhibit high internal homogeneity (within cluster) and high external homogeneity between any two clusters. It is an inductive treatment and a purely empirical method of classification.
Tree diagram based on cluster Tree diagram based on cluster analysisanalysis
Tree Diagram for students of all communities in Tripura based on Reading Motive
Complete Linkage
Euclidean distances
Linkage Distance
OTH_TR
HALAM
JAMATIA
REANG
NON_TR
CHAKMA
TRIPURI
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
MANOVAMANOVA
• MANOVA is a tool to determine significant differences in one correlated variables among the groups.
Variables Wilks’ Lambda
Rao’s R df P-value
Gender 0.97 1.76 9,558 NS
Religion 0.92 5.06 9,532 0
S-E-S 0.9 3.03 9,242 0
District 0.35 14.44 452,499 0
Blocks 0.9 6.49 9,562 0
School 0.77 5.13 9,153 0
Fisher’s Linear Discriminant Functions for differentiating Fisher’s Linear Discriminant Functions for differentiating Schools with Good and Poor InfrastructureSchools with Good and Poor Infrastructure
Attitudinal Dimensions
Good Infrastruct
ure
Poor Infrastructu
re
Cleanliness 5.05 -0.46
Safety -1.27 -0.45
Comfort 4.93 3.76
Reliability 6.2 4.51
Equal Opportunity 2.58 2.03
Constant -26.27 -15.57
Eigen Values
Canonical Correlatio
nWilk's
LambdaChi-
Square Df P-Value
0.9 0.687 0.53 101.41 5 0
Classification Matrix of Good and poor schools in terms of Classification Matrix of Good and poor schools in terms of infrastructure availabilityinfrastructure availability
Original GroupPredicted Group Good Infrastructure
Predicted Group Poor Infrastructure Total
Good Count 75 10 85
Poor 18 60 78
Total 93 70 163
Good Percentage 88.2 11.8 100
Poor 23.1 76.9 100
Correct Classification Percentage= (75+60)/163 x 100=82.8
Box-plot Analysis of Discriminant Scores Box-plot Analysis of Discriminant Scores between Good and Poor Infrastructure Schools.between Good and Poor Infrastructure Schools.
7093N =
Predicted Group for Analysis 1
4.003.00
Dis
crim
inan
t Sco
res
from
Fun
ctio
n 1
for A
naly
sis
1
4
3
2
1
0
-1
-2
-3
-4
-5
124
Some of my studies on MVS Some of my studies on MVS • Dutta Roy, D. (2007). Taxonomic approach in Job analysis. Psychological
assessment in Personnel selection. In Dr. S. Subramony and S.B.Raj (Eds.), Psychological assessment in Personnel Selection. Delhi: Defense Institute of Psychological Research, p.25-39
• Dutta Roy,D.(2006). Clusturing academic profiles of tribal and non-tribal school students of Manipur. Journal of Psychometry, 20,2, 1-12.
• Dutta Roy,D.(2006). Clusturing academic profiles of tribal and non-tribal school students of Manipur. Journal of Psychometry, 20,2, 1-12.
• Dutta Roy, D. (2002) Personality differences across four metropolitan cities of India, Indian Psychological Review, 58,2,71-78.
• Dutta Roy.D. and Bannerjee,I.(1998) Correspondence analysis between stimulus length and amount of forgetting in assessment of short term memory
span,Indian Journal of Psychometry and Education, 29,1,7-12