Click here to load reader

Comparing Multivariate Statistical Techniques and ... · PDF fileComparing Multivariate Statistical Techniques and Supervised and ... of the subspecies strangulata and 36 ones were

  • View
    217

  • Download
    0

Embed Size (px)

Text of Comparing Multivariate Statistical Techniques and ... · PDF fileComparing Multivariate...

IAALD AFITA WCCA2008 WORLD CONFERENCE ON AGRICULTURAL INFORMATION AND IT

Comparing Multivariate Statistical Techniques and Supervised and Unsupervised Neural Networks in Identifying of Subspecies and Origins of Wheat accessions Javad Khazaei 1, Mohammad R Naghavi 2, Mansoureh Danesh 1 and Rasoule Amirian 2 1 University College of Abouraihan, University of Tehran, Tehran, Iran. [email protected] 2 Plant Breeding Dept. Agricultural College, The University of Tehran, Karaj, Iran Abstract This paper will focus on the identification of wheat (Aegilops tauschii) accessions based on the morphological features by using neural networks (ANN) and multivariate discriminant Analysis (MDA) techniques. The use of ANN help the process of identification of unknown crop accessions. Our analysis was developed a supervised and an unsupervised ANN for Identification of 2 subspecies and 3 origins of 55 wheat accessions. Nineteen accessions were of the subspecies strangulata and 36 ones were of the subspecies tauschii. The 55 accessions were from Iran, and some other Middle East countries including Turkey, Azerbaijan, Tadjikistan, Turkmenistan, Afghanistan and Armenia and the origin (country) of 10 accessions was unknown. The phenotypic diversities among accessions were determined by measuring the Spikelet number, peduncle length, stem length, spikelet weight, seed weight, spike length, number of fertile stem, days to 50% flowering, days to 50% maturity, spike firmness, stem color, spike color, awn color, stem state, spike thickness, seed shape, awn state, number of seeds in spikelet, and wooly leaf.

Unsupervised self-organizing map (SOM) and a supervised backpropagation algorithm (BP) were applied to classify 2 subspecies and 3 origins of wheat samples using 19 morphological feature. The results obtained with BP and SOM method were verified by means of a principal component analysis and hierarchical cluster analysis to check whether these well-known techniques would give similar results that although two subspecies of Ae. tauschii separated using principal component analysis and hierarchical cluster analysis methods, but these methods couldnt well group accessions according to origin sites. The performances of both BP- and SOM ANN were more better than the both PCA and HCA methods. Therefore, ANN could be considered as a support tool in the process of identification of unknown accessions. Introduction Wheat (Aegilops tauschii) is the most important cereal crop when considering either its world production or consumption. Cultivated bread and durum wheat descend from hybridized wild grasses. Eig (1929) divided the Ae. tauschii into two subspecies named tauschii and strangulate. The definition and the identification of crop accessions are of considerable scientific and practical importance in modern agriculture. Classical methods for the identification and classification of plant accessions are based on their morphological characteristics. Morphological characters can be used to estimate the variation within and between accessions.

25

mailto:[email protected]

IAALD AFITA WCCA2008 WORLD CONFERENCE ON AGRICULTURAL INFORMATION AND IT

In the last years morphological data have been used to resolve the complex problem of the definition and classification of crop accessions. Multivariate statistical techniques, mainly principal component analysis (PCA), linear discriminant analysis (LDA), and, to a lesser extent, hierarchical cluster analysis (HCA), have been widely used for the assessment and classification of plant accessions according to their morphological characteristics (Aghaei et al., 2008; Manjunatha et al., 2007).Hammer (1980) and Knaggs et al. (2000) studied the germplasm of Ae. tauschii species using morphological characters.

DNA marker technologies (randomly amplified polymorphic DNA, RAPDs, AFLP) have also proven to be useful tools for characterization of varieties (Malaki et al., 2006; Peng et al., 2003). However, molecular methods are precise but they are very expensive. The LDA and PCA are linear transformation that are well suited for separating multidimensional data for different objects or class (Aghaei et al., 2008; Manjunatha et al., 2007). Linear transforms typically extract information from only the second-order correlations in the data (covariance matrix) and ignore higher-order correlations in the data. Many researchers have reported that many multidimensional data in the real world are inherently non-symmetric (Scholkopf et al., 1998; Siripatrawan, 2008). As an alternative to multidimensional analysis of variance, there are a new technique that may be applied to the process of identification and classification of crop accessions with the hope of making the accessions identification easier and faster, even automatically. These are artificial neural networks. Artificial neural networks (ANNs) are among the most commonly used nonlinear techniques.

Two main classes of ANNs have been used in computer expert systems biological objects identification systems. The first is supervised ANNs, and the second one is unsupervised ANNs. Supervised learning ANNs are calibrated to classify samples using a training set for which the desired target value of each sample is known and specified. The aim of this learning procedure is to find a mapping from input patterns to targets, in this case a mapping from morphological features patterns to accessions classes. The multilayer perceptrons (MLP) neural network or the feed forward ANN has been the most popular. Unsupervised learning allegedly involves no target values. Indeed, the term unsupervised means that the knowledge of crop is not learned from the specific inputoutput examples. Unsupervised competitive learning is used in a wide variety of fields under a wide variety of names, the most common of which are data partitioning and classification, "cluster analysis". A popular unsupervised ANN for clustering is Kohonens self-organizing map (SOM) devised mainly for visualization of nonlinear relations of multi-dimensional data (Kohonen, 1995). SOM can be used for grouping of complex sample data without any strict assumption and without any priori knowledge of the number of groups present (Giraudel and Lek, 2001). With the unsupervised networks, patterns are presented to the network and it forms its own groupings of the data. In contrast, with supervised training, which is appropriate for identification, data patterns of known identity are presented to the ANN as exemplars. Once trained, any data pattern can be presented to the ANN and the output analysed to find the most likely identity of that pattern. ANNs can be applied as an alternative to various statistical procedures, and are particularly useful in cases of non-linear relationships between predictor and dependent variables (Basheer and Hajmeer, 2000). Several studies showed that, for the purpose of classification, ANNs often have superior predictive performance when compared to conventional statistical procedures, i.e. discriminant analysis and logistic regression (e.g. Manel et al., 1999). Moshou et al. (2005) found that that the performance of SOM-ANNs in classification of plant disease was better than

26

IAALD AFITA WCCA2008 WORLD CONFERENCE ON AGRICULTURAL INFORMATION AND IT

that of quadratic discriminant analysis methods. Kurdthongmee (2008) reported that SOM could be used successfully for classification of wood boards of naturally different shades and colours based on the colour features. The Self-organizing map was used successfully to classify the bryophytes according to the concentrations of the chemical elements (Samecka-Cymerman et al., 2007).

Supervised ANNs has also been applied as an alternative to classical multivariate statistical techniques in crop varieties classification. For example, the supervised ANNs has been used to the classification of olive variety on the basis of chemical indices (Marini et al., 2004). Ceca and Moro (1997) reported that supervised ANNs could be considered as a promising technique to developed support tools for the process of selection of new varieties. They found that Multivariate Discriminant Analysis does not offer any substantial advantage over the traditional multidimensional analysis of variance for the process of selection of new varieties.

The objective of the present paper was to describe and evaluate the efficiency of supervised (back-propagation ANN) and unsupervised (self-organized-map) artificial neural networks to identify the origin and subspecies of 55 wheat accessions based on their morphological characteristics. The aim is to explore new ways to make the identification of wheat accessions with the unknown origin, or at least to find procedures that would help the process. The results were compared to those obtained with PCA method, common multivariate statistical techniques for crop accessions analysis. Materials and Methods Fifty five accessions of Aegilops tauschii were provided by the gene bank of the Agricultural college at the University of Tehran, Iran. Eighteen accessions were of the subspecies strangulata and 37 accessions were of the subsp. tauschii. The accessions evaluated were from Iran, some other Middle East countries (Turkey, Azerbaijan, Tadjikistan, Turkmenistan, Afghanistan and Armenia) and the origin (country) of 18 accessions was unknown (Table 1). Each accession was planted in 1m long rows with 0.5 m row spacing in experimental station of Agriculture College at the University of Tehran, Iran, during 2004. For characterization and evaluation, morphological data were recorded following descriptors established for Aegilops (IBPGR, 1981) with some modifications. Morphological data included 9 quantitative and 10 qualitative characters as follow:

Data on days to 50% flowering (days fr