4
VALIDATION OF BIPLS FOR IMPROVING YIELD ESTIMATION OF RICE PADDY FROM HYPERSPECTRAL DATA IN WEST JAVA, INDONESIA Taichi Takayama 1 , Atsushi Uchida 1 , Hozuma Sekine 1 , Kotaro Fukuhara 1 , Keigo Yoshida 1 , Osamu Kashimu 2 , Sidik Muljono 3 , Arief D. 3 , M. Evri 3 , Muhamad. Sadly 3 1 Mitsubishi Research Institute, Inc., 2-10-3, Nagata-cho, Chiyoda-ku, Tokyo 100-8141, JAPAN 2 Japan Space Systems, JAPAN 3 Agency for the Assessment and Application of Technology (BPPT), INDONESIA ABSTRACT Rice is one of the important agricultural crops and constitutes major staple food for Asian countries, especially in Indonesia. Monitoring and management of paddy fields are one of key factors for ensuring national food security, and remote sensing technology and its data, especially hyperspectral data, are expected to be a highly effective solution. The objective of this paper is to use airborne hyperspectral data and yield data to develop and validate a high performance prediction model based on a statistical technique for yield of rice paddy. Index TermsHyperspectral data, rice, yield, BiPLS, HyMAP 1. INTRODUCTION In Asian countries, especially Indonesia, rice is one of the important agricultural crops and constitutes major staple food. Decreasing rice production will directly affect national food security in Indonesia [1]. One of the major focuses of Indonesias food security is the rice supply, which keeps attracting debates on how to support both farmers and consumers [2]. For the national food security, accurate monitoring and management of paddy fields are important. Indonesian government recognizes that low cost and effective approach for the monitoring and management of paddy fields are key factors for ensuring national food security, and remote sensing technology and its data are expected to be a highly effective solution. The crop forecasting map by using remote sensing is efficient for political measure. There are many researches for estimation of biophysical characteristics of agricultural crops by using multispectral data, which have limitations in providing accurate estimates [3], [4]. This limitation has motivated to use hyperspectral data for the estimation, and some governments have planted to include hyperspectral sensors onboard the new generation of satellites [5]. The Ministry of Economy, Trade and Industry of Japan is currently planning to launch an earth observation sensor (HISUI: Hyper-spectral Imager SUIte). This research was conducted as a part of the research and development project of Japan Space Systems focused on spaceborne hyperspectral data exploitation after the sensor being operational on orbit [6], [7], [8]. The purpose of this study is to develop a high performance prediction model for yield amount of rice paddy based on a statistical technique with hyperspectral data. The hyperspectral data for this study was acquired by HyMap sensor over the West Java in Indonesia in 2008 and 2011. In the past more than 20 years, there were many researches on yield estimation by using hyperspectral data with a large number of spectral bands data, which provide rich optical information [9], [10]. Statistical techniques with hyperspectral data for making crop forecasting models need enough amounts of training samples. In many cases, there is a concern about the small-sample-size problem with high- dimensional data due to limitation of field samples. This problem can be caused of the complexity of prediction models, resulting in the poor performance caused by model overfitting. The importance of crop forecasting under the limitation of the training data is shown in the report of GEOSS [11]. In this study, Backward interval Partial Least Square Regression (BiPLS) was adopted to yield estimation of rice for overcoming those ill-posed problems. This study proceeds in two steps to develop a high accuracy prediction model. Firstly, we apply the BiPLS not only to reflectance data from hyperspectral sensor but also to the first derivation and second derivation of that reflectance data to focus on continuity of spectral curve. Secondly, we predicted yield in 2008 by using two prediction models to validate robustness of the model with different year’s data, one is from only 2011 data and the other is from 2011 and 2008 data, and validated them. 6581 978-1-4673-1159-5/12/$31.00 ©2012 IEEE IGARSS 2012

[IEEE IGARSS 2012 - 2012 IEEE International Geoscience and Remote Sensing Symposium - Munich, Germany (2012.07.22-2012.07.27)] 2012 IEEE International Geoscience and Remote Sensing

  • Upload
    muhamad

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: [IEEE IGARSS 2012 - 2012 IEEE International Geoscience and Remote Sensing Symposium - Munich, Germany (2012.07.22-2012.07.27)] 2012 IEEE International Geoscience and Remote Sensing

VALIDATION OF BIPLS FOR IMPROVING YIELD ESTIMATION OF RICE PADDY FROM HYPERSPECTRAL DATA IN WEST JAVA, INDONESIA

Taichi Takayama1, Atsushi Uchida1, Hozuma Sekine1, Kotaro Fukuhara1, Keigo Yoshida1,

Osamu Kashimu2, Sidik Muljono3, Arief D.3, M. Evri3, Muhamad. Sadly3

1Mitsubishi Research Institute, Inc., 2-10-3, Nagata-cho, Chiyoda-ku, Tokyo 100-8141, JAPAN 2Japan Space Systems, JAPAN

3Agency for the Assessment and Application of Technology (BPPT), INDONESIA

ABSTRACT Rice is one of the important agricultural crops and constitutes major staple food for Asian countries, especially in Indonesia. Monitoring and management of paddy fields are one of key factors for ensuring national food security, and remote sensing technology and its data, especially hyperspectral data, are expected to be a highly effective solution. The objective of this paper is to use airborne hyperspectral data and yield data to develop and validate a high performance prediction model based on a statistical technique for yield of rice paddy.

Index Terms— Hyperspectral data, rice, yield, BiPLS, HyMAP

1. INTRODUCTION In Asian countries, especially Indonesia, rice is one of the important agricultural crops and constitutes major staple food. Decreasing rice production will directly affect national food security in Indonesia [1]. One of the major focuses of Indonesia’s food security is the rice supply, which keeps attracting debates on how to support both farmers and consumers [2]. For the national food security, accurate monitoring and management of paddy fields are important. Indonesian government recognizes that low cost and effective approach for the monitoring and management of paddy fields are key factors for ensuring national food security, and remote sensing technology and its data are expected to be a highly effective solution. The crop forecasting map by using remote sensing is efficient for political measure.

There are many researches for estimation of biophysical characteristics of agricultural crops by using multispectral data, which have limitations in providing accurate estimates [3], [4]. This limitation has motivated to use hyperspectral data for the estimation, and some governments have planted

to include hyperspectral sensors onboard the new generation of satellites [5]. The Ministry of Economy, Trade and Industry of Japan is currently planning to launch an earth observation sensor (HISUI: Hyper-spectral Imager SUIte). This research was conducted as a part of the research and development project of Japan Space Systems focused on spaceborne hyperspectral data exploitation after the sensor being operational on orbit [6], [7], [8].

The purpose of this study is to develop a high performance prediction model for yield amount of rice paddy based on a statistical technique with hyperspectral data. The hyperspectral data for this study was acquired by HyMap sensor over the West Java in Indonesia in 2008 and 2011.

In the past more than 20 years, there were many researches on yield estimation by using hyperspectral data with a large number of spectral bands data, which provide rich optical information [9], [10]. Statistical techniques with hyperspectral data for making crop forecasting models need enough amounts of training samples. In many cases, there is a concern about the small-sample-size problem with high-dimensional data due to limitation of field samples. This problem can be caused of the complexity of prediction models, resulting in the poor performance caused by model overfitting. The importance of crop forecasting under the limitation of the training data is shown in the report of GEOSS [11]. In this study, Backward interval Partial Least Square Regression (BiPLS) was adopted to yield estimation of rice for overcoming those ill-posed problems. This study proceeds in two steps to develop a high accuracy prediction model. Firstly, we apply the BiPLS not only to reflectance data from hyperspectral sensor but also to the first derivation and second derivation of that reflectance data to focus on continuity of spectral curve. Secondly, we predicted yield in 2008 by using two prediction models to validate robustness of the model with different year’s data, one is from only 2011 data and the other is from 2011 and 2008 data, and validated them.

6581978-1-4673-1159-5/12/$31.00 ©2012 IEEE IGARSS 2012

Page 2: [IEEE IGARSS 2012 - 2012 IEEE International Geoscience and Remote Sensing Symposium - Munich, Germany (2012.07.22-2012.07.27)] 2012 IEEE International Geoscience and Remote Sensing

Table 1. Data set for training and validation

# Training data Validation data Yield amount data Hyperspectral data Yield amount data Hyperspectral data

Case : 1 40 samples in 2011 2011 data 11 samples in 2008, where rice paddy was harvested after August 20th

2008 data Case : 2

40 samples in 2011 16 samples in 2008, where rice paddy was harvested before August 20th

2011 data 2008 data

2. SURVEY AREA AND DATA

Our study areas are Indramayu, Subang and Karawang which are well known as major granaries in West Java area (Fig.1.). Dual and triple cropping of rice is common trend in these areas. The yield depends on the condition of irrigation.

IndramayuKarawang

Subang Fig. 1. Survey area in Indonesia

We conducted airborne observation and field campaign

at Subang and Indramayu in 2008 and at Karawang and Indramayu in 2011. An airborne hyperspectral sensor, HyMap, with 126 bands (450nm - 2480nm) was used for this observation, and acquired images at 4.2m spatial resolution. The data sets were acquired on June 30th (Indramayu) and July 1st (Subang) in 2008, and on July 13th (Karawang) in 2011. The actual yield measured as training data was recorded after the field campaign. For prediction model, we didn’t select data of very young seeding and late-harvest period as training data, whose spectral curves have a very low relationship with yield. Regarding the 2008 measurement data, 30 samples (data sets) of HyMap data and yield amount data in Subang and Indramayu were set up. In the 2011 measurement data, 52 samples in Karawang were used for this analysis.

3. METHODOLOGY

3.1. Prediction model A large number of spectral bands can acquire much information. Multiple regression analysis and Partial Least Squares (PLS) Regression are easy and simple to predict yield. However, most of statistical techniques, including PLS, with hyperspectral data for making crop forecasting models need enough amounts of training samples, and if the

training data is small, these models often affected by less important bands and overfitting problem. On the other hand, BiPLS selects more useful bands for prediction [12]. In this study, we select BiPLS, one of the supervised methods, to solve these problems. BiPLS is a variable selection method dedicated to spectral data, and uses PLS regression for building linear models. Generally, the spectral range is divided into some numbers of intervals for BiPLS. In this study, the interval as variable is each band, and the aim of BiPLS process is to remove the least relevant bands. The performance was validated the root mean squared error from cross validation. 3.2. Experimental Overview In many cases, hyperspectral data is used as discrete data. On the other hand, first derivation data is used for analysis to focus on continuities of spectral curve in the analysis of multispectral data. We used first and second derivation of hyperspectral data for comparing prediction performances with reflectance data. The prediction models were evaluated the generalization capability by 100 times 4th-fold cross validation.

In the practical use of this prediction model in the future, the training data is not always enough amounts every year, and the prediction model with different year’s data may be necessary. In this study, to validate robustness of the prediction model with different year’s data and to know how training data sets is suitable for high accuracy model, two cases of data set for training and validation were set up. One case was a training data set which was selected from only 2011 data, and the other case was a training data set selected from 2011 and 2008 data. These data sets for comparative verification are shown in Table 1. We validated performance of the two cases.

4. RESULTS AND DISCUSSIONS

The results of validation of developed model by using 52 data sets in 2011 and using 30 data sets in 2008 are shown in Table 2. Table 2 is results from reflectance, first derivation, and second derivation data. In both 2011 and 2008 data, similar wavelength bands were selected for yield prediction in BiPLS process. In the case of 2008 data, coefficient of determination of prediction model from first

6582

Page 3: [IEEE IGARSS 2012 - 2012 IEEE International Geoscience and Remote Sensing Symposium - Munich, Germany (2012.07.22-2012.07.27)] 2012 IEEE International Geoscience and Remote Sensing

Table 2. Comparison between the prediction accuracy of 2011 data and 2008 data

Training data : 2011 data Average by 100 times cross validation in 2011 data

Training data : 2008 data Average by 100 times cross validation in 2008 data

Reflectance First derivation

Second derivation

Reflectance First derivation

Second derivation

coefficient of determination 0.672 0.692 0.745 0.2421 0.6045 0.5469 RMSE [t/ha] 0.772 0.761 0.697 0.3906 0.2760 0.2996 The number of selected band 6 17 13 6 6 7 derivation data is higher and RMSE of that is lower than those from reflectance data and second derivation data. On the other hand, the prediction model with second derivation data had the highest prediction performance. In the case of 2011 data, those from second derivation data are better than those from reflectance and first derivation data, but the first and second derivation data have possibility of overfitting because the number of selected bands in the case of first derivation and second derivation are much more than that in the case of reflectance data. Generally, BiPLS has the low risk of overfitting, but in this case the many numbers of the selected bands shows the risk of it high. Therefore, while RMSE of reflectance data is not better than those of first and second derivation, the risk of overfitting of reflectance is least. For this reason, reflectance data is best for prediction. On the other hand, it is still difficult to judge which is better for prediction, and it is necessary to verify prediction by using many years’ data. About prediction accuracy, in both cases of 2008 and 2011, the accuracy was high, and the difference of accuracy between 2008 and 2011 depend on data quality and variation of training data.

Furthermore, we applied BiPLS with reflectance data and yield data of Case 1 and Case 2 in Table 1 for the validation of prediction accuracy. Comparison with prediction and measured value in both cases is shown in Fig.2. RMSE between prediction data and measured data in both cases is shown in Table 3. These results show that prediction of yield in 2008 by using only 2011 data is not enough quality, and prediction accuracy is getting high by using some 2008 data and 2011 data.

Prediction model of Case 2 was adapted to HyMap data for mapping the yield estimation at Karawang, which is shown in Fig.3. This result map is consistent with local statistical value data by the local government.

Table 3. RMSE of case 1 and case 2

Training data

Case : 1 (2011 data)

Case : 2 (2008 and 2011 data)

RMSE [t/ha] 1.560 0.584

Fig. 2. Comparison of predicted value with measured

value

Fig. 3. Yield prediction map in Karawang

6583

Page 4: [IEEE IGARSS 2012 - 2012 IEEE International Geoscience and Remote Sensing Symposium - Munich, Germany (2012.07.22-2012.07.27)] 2012 IEEE International Geoscience and Remote Sensing

5. CONCLUSION This paper has presented that the yield prediction model by using BiPLS applied to hyperspectral data can be performed high accurate estimation. Experimental results showed that the proposed approach with reflectance data is better prediction accuracy than that with first derivation data and that with second derivation data.

When two years data is available for making yield prediction model, the accuracy is better than that from only one year data. This shows a possibility of making robust prediction model by using multiple years’ data. Continued data acquisition of HyMap data and field data in different years will provide more case studies to validate robustness of yield prediction model.

By using yield information provided by this method, Indonesian government can know the yield of paddy for their Area of Interest.

6. REFERENCES [1] R. Oktaviani, S. Amaliah, C. Ringler, M.W. Rosegrant, and T.B. Sulser, “The Impact of Global Climate Change on the Indonesian Economy,” International Food Policy Research Institute, 2011. [2] P. Simatupang, and C.P. Timmer, “Rice Production: Policies and Realities,” Bulletin of Indonesian Economic Studies, 44(1), pp. 65-80, 2008. [3] P.S. Thenkabail, A.D. Ward, and J.G Lyon, “Landsat-5 Thematic Mapper models of soybean and corn crop characteristics,” International Journal of Remote Sensing, 15, pp. 49-61, 1994.. [4] T.I.R. Almeida, C.R. De Souza Filho, and R. Rossetto, “ASTER and Landsat ETM+ images applied to sugarcane yield forecast,” International Journal of Remote Sensing, 27, pp. 4057-4069, 2006. [5] P.S. Thenkabail, R.B. Smith, and E.De Pauw, “Hyperspectral Vegetation Indices and Their Relationships with Agricultural Crop Characteristics,” Remote Sensing of Environment, 71, pp. 158-182, 2000.. [6] S.L. Osborne, J.S. Schepers, D.D. Francis, and M.R. Schlemmer, “Use of Spectral Radiance to Estimate In-Season Biomass and Grain Yield in Nitrogen- and Water-Stressed Corn,” Crop Science, 42, pp. 165-171, 2002. [7] A. Uchida, H. Sekine, K. Fukuhara, K. Yoshida, T. Takayama, C. Kobayashi, O. Kashimura, S. Muljono, Arief D., M. Evri, and M. Sadly, “Development of Growth Stage Classification Method for Paddy Fields by Using Sparse Linear Discriminant Analysis,” Asian Conference on Remote Sensing (ACRS), 2011. [8] K. Yoshida, T. Ohki, M. Terabe, H. Sekine, and T. Takeda, “A Methodology of Forest Monitoring From Hyperspectral Images

With Sparse Regularization,” Geoscience and Remote Sensing Symposium (IGARSS), 2011. [9] M. Shibayama, and T. Akiyama, “Estimating Grain Yield of Maturing Rice Canopies Using High Spectral Resolution Reflectance Measurements,” Remote Sensing of Environment, 36, pp. 45-53, 1991. [10] M. Reyniers, E. Vrindts, and J.D. Baerdemaeker, “Optical Measurement of Crop Cover for Yield Prediction of Wheat,” Biosystems Engineering, 89, pp. 383-394, 2004. [11] J. Gallego, M. Craig, J. Michaelsen, B. Bossyns, and S. Fritz, “Best practices for crop area estimation with Remote Sensing,” JRC Scientific and Technical Reports, 2010. [12] R. Leardi, and L. Nørgaard, “Sequential application of backward interval partial least squares and genetic algorithms for the selection of relevant spectral regions,” Journal of Chemometrics, 18, pp. 486-497, 2004.

6584