26
UNIVERSITY OF KALYANI M . Sc in Statistics Multivariate analysis of Chemical Compositions Related to some rice grain varieties Team members : DIPIKA PATRA ARNAB JANA Mentor: SADHAN SAMAR MAITY Team members : DIPIKa PaTra arNab JaNa meNTor: saDhaN swaPaN maITy

Multivariate analysis for 26 rice grain varieties

Embed Size (px)

DESCRIPTION

Case Study :Rice Grain Varieties each with 4-5 replicates,with their 74 chemical constituents Functional:- 1. Classification of Rice Varieties 2. Searching for responsible variables explaining total variability among the measurements. 3. Detection of Superior varieties

Citation preview

Page 1: Multivariate analysis for 26 rice grain varieties

UNIVERSITY OF KALYANI

M . Sc in Statistics

Multivariate analysis of Chemical Compositions

Related to some rice grain varieties

Team members : DIPIKA PATRA ARNAB JANA

Mentor: SADHAN SAMAR MAITY

Team members : DIPIKa PaTra arNab JaNa

meNTor: saDhaN swaPaN maITy

Page 2: Multivariate analysis for 26 rice grain varieties

DATA DESCRIPTION:Data contain 26 varieties of rice grains, each with 4-5 Data contain 26 varieties of rice grains, each with 4-5 replicates, with their 74 chemical constituents placed replicates, with their 74 chemical constituents placed in 7 different groups (groups marked with different in 7 different groups (groups marked with different colours)colours)..\Documents\RICE_RAW.xls..\Documents\RICE_RAW.xls named as named as1> AMINO ACID1> AMINO ACID2> ORGANIC ACID chelated with salt2> ORGANIC ACID chelated with salt3> PHENOL3> PHENOL4> LIPIDS4> LIPIDS5> CARBOHYDRAD5> CARBOHYDRAD6> STEROL6> STEROL7> NITROGEN containing RIBOSE7> NITROGEN containing RIBOSEThe measurements were generated by analyzing the rice The measurements were generated by analyzing the rice extracts in GC-MS instrument. Here measurements are extracts in GC-MS instrument. Here measurements are unit free. unit free.

Page 3: Multivariate analysis for 26 rice grain varieties

OBJECTIVE OF THE PROJECT:OBJECTIVE OF THE PROJECT:

•Trying for classification of rice varieties.Trying for classification of rice varieties.

•Searching for responsible variables explaining Searching for responsible variables explaining total variability among the measurements.total variability among the measurements.

•Detection of superior varieties.Detection of superior varieties.

ANALYTICAL SOFTWARES :ANALYTICAL SOFTWARES :

SAS, SPSS, MINITAB, MICROSOFT OFFICESAS, SPSS, MINITAB, MICROSOFT OFFICE

Page 4: Multivariate analysis for 26 rice grain varieties

STEPS OF ANALYSISSTEPS OF ANALYSIS::

1>Preparing data1>Preparing data

2>2>Multivariate AppliancesMultivariate AppliancesOne-way MANOVAOne-way MANOVACluster Analysis(CA)Cluster Analysis(CA)Principal Component Analysis(PCA)Principal Component Analysis(PCA)Canonical Correlation Analysis(CCA)Canonical Correlation Analysis(CCA)Multi-dimensional Scaling(MDS)Multi-dimensional Scaling(MDS)Profile Analysis(PA)Profile Analysis(PA)

3> Interpretation of the results3> Interpretation of the results

4> Conclusion4> Conclusion

Page 5: Multivariate analysis for 26 rice grain varieties

ONE WAY MANOVA

Page 6: Multivariate analysis for 26 rice grain varieties

ONE-WAY MANOVAONE-WAY MANOVA

The MANOVA procedure tries to find out if there is The MANOVA procedure tries to find out if there is any significant difference among the 26 varieties any significant difference among the 26 varieties across 7 Biochemical groups. We assume that the across 7 Biochemical groups. We assume that the concerned datasets are coming from 26 concerned datasets are coming from 26 homoscedastic multi-normal populations with homoscedastic multi-normal populations with different mean vectors.different mean vectors.

Output are shown here-Output are shown here- ..\Documents\MANOVA_OUTPUT.xlsx..\Documents\MANOVA_OUTPUT.xlsx

Page 7: Multivariate analysis for 26 rice grain varieties

CONCLUSION:CONCLUSION:

One way MANOVA results show all the p-values are One way MANOVA results show all the p-values are very small. So we reject the hypothesis which very small. So we reject the hypothesis which says that all the varieties are same.says that all the varieties are same.So there So there are significant differences among the mean are significant differences among the mean vectors not only of the varieties but also for vectors not only of the varieties but also for any group.any group.

Page 8: Multivariate analysis for 26 rice grain varieties

CLUSTER ANALYSIS

Page 9: Multivariate analysis for 26 rice grain varieties

FINAL PARTITIONFINAL PARTITION

Using Using single linkagesingle linkage & & Euclidean measureEuclidean measureCluster 1Cluster 1WA WB WC WD WE WA WB WC WD WE WG WH WI WJ WK WG WH WI WJ WK WL WM RB RD RF WL WM RB RD RF

Cluster 2Cluster 2WF WF

Cluster 3Cluster 3WN WO WP WQ WN WO WP WQ

Cluster 4Cluster 4WR WT RA RC RE WR WT RA RC RE

Cluster 5Cluster 5WSWS

Page 10: Multivariate analysis for 26 rice grain varieties

DENDROGRAM OF 26 VARIETIES:DENDROGRAM FOR 26 VARIETIES

Page 11: Multivariate analysis for 26 rice grain varieties

PRINCIPAL COMPONENT ANALYSIS

Page 12: Multivariate analysis for 26 rice grain varieties

Eigen values of the Covariance MatrixEigen values of the Covariance Matrix

Eigen Eigen value value

DifferenceDifferenceProportionProportionCumulativeCumulative

11 744960.685744960.685 666389.993666389.9930.83870.8387 0.83870.8387

22 78570.69178570.691 32024.0932024.09 0.08850.0885 0.92710.9271

33 46546.60246546.602 36036.2336036.23 0.05240.0524 0.97950.9795

44 10510.37210510.372 7521.537521.53 0.01180.0118 0.99140.9914

55 2988.8422988.842 424.482424.482 0.00340.0034 0.99470.9947

Principal component vectors are shown herePrincipal component vectors are shown here..\..\..\dipika\KU_PRINCIPAL_1.xls..\..\..\dipika\KU_PRINCIPAL_1.xls

We select out responsible variables whose We select out responsible variables whose contribution to the principal components is contribution to the principal components is significant (with loading beyond ± 0.5).significant (with loading beyond ± 0.5). Responsible variables (with their loadings)Responsible variables (with their loadings)P1 Guanine (0.89) P1 Guanine (0.89) P2 Sucrose (0.91) P2 Sucrose (0.91) P3 Linoleic Acid (0.66)P3 Linoleic Acid (0.66)P4 Phosphate (0.67)P4 Phosphate (0.67)

Page 13: Multivariate analysis for 26 rice grain varieties

CANONICAL CORRELATION CANONICAL CORRELATION ANALYSIS:ANALYSIS:

Page 14: Multivariate analysis for 26 rice grain varieties

GROUPGROUP 11 22 33 44 55 66 7711 0.9750.975 0.9660.966 0.9760.976 0.8980.898 0.8270.827 0.9560.95622 0.9750.975 0.9880.988 0.9930.993 0.9550.955 0.8580.858 0.9810.98133 0.9660.966 0.9880.988 0.9960.996 0.8320.832 0.7310.731 0.9880.98844 0.9760.976 0.9930.993 0.9960.996 0.9430.943 0.8970.897 0.9890.98955 0.8980.898 0.9550.955 0.8320.832 0.9430.943 0.80.8 0.8810.88166 0.8270.827 0.8580.858 0.7310.731 0.8970.897 0.80.8 0.8160.81677 0.9560.956 0.9810.981 0.9880.988 0.9890.989 0.8810.881 0.8160.816

Canonical Correlation is a measure of association Canonical Correlation is a measure of association between two groups of random variables.between two groups of random variables.

Following table gives the entries of largest Following table gives the entries of largest Canonical Correlation between any two pairs of Canonical Correlation between any two pairs of groups.groups.

Page 15: Multivariate analysis for 26 rice grain varieties

MULTIDIMENSIONAL SCALING :

Page 16: Multivariate analysis for 26 rice grain varieties

Using SPSS Given outputs are:

Page 17: Multivariate analysis for 26 rice grain varieties

Main points from SPSS output :

Page 18: Multivariate analysis for 26 rice grain varieties

CONCLUSIONCONCLUSION

High canonical correlation indicates more High canonical correlation indicates more common(latent) factors interplay among the common(latent) factors interplay among the groups. From CCA, the table related to first groups. From CCA, the table related to first canonical correlation shows the entries nearer to canonical correlation shows the entries nearer to unity indicating a good no. of (common) factors unity indicating a good no. of (common) factors interplays among the 7 groups. interplays among the 7 groups.

However, from MDS analysis 7 groups are found However, from MDS analysis 7 groups are found quite scattered indicating their uniqueness, quite scattered indicating their uniqueness, rather than their similarities. Thus this event rather than their similarities. Thus this event contradicts the former establishment by CCA. It contradicts the former establishment by CCA. It may happen that latent factors among 7 groups are may happen that latent factors among 7 groups are not uncorrelated (i.e., non-orthogonal). We not uncorrelated (i.e., non-orthogonal). We dispense with factor analysis. dispense with factor analysis.

Page 19: Multivariate analysis for 26 rice grain varieties

SPECIAL ATTENTION TO AMINO ACID GROUP

Page 20: Multivariate analysis for 26 rice grain varieties
Page 21: Multivariate analysis for 26 rice grain varieties

Considering varieties from cluster 1

Page 22: Multivariate analysis for 26 rice grain varieties

Variety “WF”

Cluster 3

Page 23: Multivariate analysis for 26 rice grain varieties

Cluster 4

Cluster 5

Page 24: Multivariate analysis for 26 rice grain varieties

From the above 5 diagrams, it is clear that no clear cut views on the varieties can be settled. Indeed, superiority amongst the varieties is hard to judge without any suitable criterion framed beforehand.

Page 25: Multivariate analysis for 26 rice grain varieties

FINAL CONCLUSION: The varieties are significantly different4 variables namely Guanine, Sucrose, Linoleic Acid & Phosphate are detected as responsible variables for capturing maximum share of system variance.We can group the 26 varieties into 5 groups(or clusters)Canonical correlations among 7 bio-chemical groups are found very high. So, they are governed by internal common factor(s), which needs factor analysis (FA). Due to shortage of time FA is not conducted.CCA & MDS lead to adverse results regarding biochemical groups.From profile diagrams, indeed no clear cut conclusion on best variety is obtained.Correspondence Analysis on variety vs. constituents could throw some better insight about their level wise correspondence. Due to shortage of time such analysis could not be done.

Page 26: Multivariate analysis for 26 rice grain varieties

Thanking You

DIPIKA PATRAROLL NO: 96/STS/[email protected]

ARNAB JANAROLL NO: 96/STS/[email protected]