DOI QR코드

DOI QR Code

Rice yield prediction in South Korea by using random forest

Random Forest를 이용한 남한지역 쌀 수량 예측 연구

  • Kim, Junhwan (Divison of Crop Physiology and Production, National Institute of Crop Science, Rural Development Administration) ;
  • Lee, Juseok (Bio-Evaluation Center, Korea Research Institute of Bioscience and Biotechnology) ;
  • Sang, Wangyu (Divison of Crop Physiology and Production, National Institute of Crop Science, Rural Development Administration) ;
  • Shin, Pyeong (Divison of Crop Physiology and Production, National Institute of Crop Science, Rural Development Administration) ;
  • Cho, Hyeounsuk (Divison of Crop Physiology and Production, National Institute of Crop Science, Rural Development Administration) ;
  • Seo, Myungchul (Divison of Crop Physiology and Production, National Institute of Crop Science, Rural Development Administration)
  • 김준환 (농촌진흥청 국립식량과학원 작물재배생리과) ;
  • 이주석 (한국생명공학연구원 바이오평가 센터) ;
  • 상완규 (농촌진흥청 국립식량과학원 작물재배생리과) ;
  • 신평 (농촌진흥청 국립식량과학원 작물재배생리과) ;
  • 조현숙 (농촌진흥청 국립식량과학원 작물재배생리과) ;
  • 서명철 (농촌진흥청 국립식량과학원 작물재배생리과)
  • Received : 2019.05.26
  • Accepted : 2019.06.17
  • Published : 2019.06.30

Abstract

In this study, the random forest approach was used to predict the national mean rice yield of South Korea by using mean climatic factors at a national scale. A random forest model that used monthly climate variable and year as an important predictor in predicting crop yield. Annual yield change would be affected by technical improvement for crop management as well as climate. Year as prediction factor represent technical improvement. Thus, it is likely that the variables of importance identified for the random forest model could result in a large error in prediction of rice yield in practice. It was also found that elimination of the trend of yield data resulted in reasonable accuracy in prediction of yield using the random forest model. For example, yield prediction using the training set (data obtained from 1991 to 2005) had a relatively high degree of agreement statistics. Although the degree of agreement statistics for yield prediction for the test set (2006-2015) was not as good as those for the training set, the value of relative root mean square error (RRMSE) was less than 5%. In the variable importance plot, significant difference was noted in the importance of climate factors between the training and test sets. This difference could be attributed to the shifting of the transplanting date, which might have affected the growing season. This suggested that acceptable yield prediction could be achieved using random forest, when the data set included consistent planting or transplanting dates in the predicted area.

이 연구의 목적은 random forest 를 활용하여 기상요소만을 이용하여 우리나라 전체의 벼 평균수량을 예측하는데 있다. Random forest 는 예측에 사용되는 각 predictor variable 을 분리할 수 있는데 이를 통해 분리된 시계열 상의 추세가 비정상적인 증가형태를 보였다. 이는 결국 예측능력의 저하로 이어지기 때문에 이를 제거할 필요가 있고 본 연구에서는 이동 평균을 이용하여 제거한 후 예측을 하였다. 1991 년부터 2005 년까지의 기상자료와 수량자료를 학습에 사용하였고 2006 년부터 2015 년까지의 자료들을 검증용으로 사용하였다. 학습자료에 대해서는 상당히 정확한 예측 능력을 보여주었으나 검증 자료에서는 그렇지 못하였다. 그 이유를 분석하기 위해 학습 자료와 검증자료에 대해서 각각 변수 중요도를 산출하여 비교한 결과 두 자료 간에 월별 기상 자료에 대한 중요도가 변동되었음을 발견하였다. 이러하 차이가 발생한 이유는 학습자료와 검증 자료에서의 전국적으로 표준이앙기가 이동하여 벼의 생육기간 자체가 변하였기 때문이다. 따라서, 정확한 예측을 위해서는 지역별 파종기 또는 이앙기에 대한 자료가 필요하며 단순히 기상 자료만을 활용한 예측은 어려운 것으로 생긱된다.

Keywords

NRGSBM_2019_v21n2_75_f0001.png 이미지

Fig. 2. Time series comparison between observed rice yield and predicted rice yield in South Korea from 1991 to 2005.

NRGSBM_2019_v21n2_75_f0002.png 이미지

Fig. 1. Performance of Random forest in national mean rice yield of South Korea with train data (1991-2005).

NRGSBM_2019_v21n2_75_f0003.png 이미지

Fig. 3. Variable importance plot from Random Forest in rice prediction with train data (1991-2005). Max temp: Monthly mean temperature, Mean temp: Monthly mean temperature, Min temp: Monthly minimum temperature, Mean sh: Monthly mean sunshine hour. %IncMSE: mean square error, the higher %IncMSE is more important.

NRGSBM_2019_v21n2_75_f0004.png 이미지

Fig. 4. Partial dependence plots for the top ranked predictor variable, year, from variable importance measures of Random Forests models with train data (1991-2005).

NRGSBM_2019_v21n2_75_f0005.png 이미지

Fig. 5. Observed yield and 10-year moving average, 5-year moving average and 3-year moving average of National mean yield in South Korea.

NRGSBM_2019_v21n2_75_f0006.png 이미지

Fig. 6. Variable importance plot from Random Forest in yield fluctuation prediction with detrended train data(1991-2005). Max temp: Monthly mean temperature, Mean temp: Monthly mean temperature, Min temp: Monthly minimum temperature, Mean sh : Monthly mean sunshine hour. %IncMSE: mean square error, the higher %IncMSE is more important.

NRGSBM_2019_v21n2_75_f0007.png 이미지

Fig. 7. Time series comparison between observed rice yield and predicted rice yield in South Korea with train data (1991-2005) and test data (2006-2015).

NRGSBM_2019_v21n2_75_f0008.png 이미지

Fig. 8. Performance of Random forest in national mean rice yield of South Korea with test year (2006-2015).

NRGSBM_2019_v21n2_75_f0009.png 이미지

Fig. 9. Variable importance comparison between train set (1991-2005) and test set (2006-2016).

Table 1. Comparison of model predictive performance when trend is removed by using the moving average

NRGSBM_2019_v21n2_75_t0001.png 이미지

References

  1. Ahn, A. B., 1973: Studies on the varietal difference in the physiology of ripening in rice with special reference to raising the percentage of ripened grains. Korean Journal of Crop Science 14, 1-40.
  2. Baruth, B., A. Royer, A. Klisch, and G. Genovese, 2008: The use of remote sensing within the mars crop yield monitoring system of the european commission. The International Archives of Photogrammetry Remote Sensing and Spatial Information Sciences, 37, 935-940.
  3. Basso, B., D. Cammarano, and E. Carfagna, 2013: Review of crop yield forecasting methods and early warning systems. In: Proceedings of the First Meeting of the Scientific Advisory Committee of the Global Strategy to Improve Agricultural and Rural Statistics, FAO Headquarters, Rome, Italy, 18-19.
  4. Boote, K. J., J. W. Jones, and N. B. Pickering, 1996: Potential uses and limitations of crop models. Agronomy Journal 88(5), 704-716. https://doi.org/10.2134/agronj1996.00021962008800050005x
  5. Breiman, L., 2001: Random Forest. Machine Learning 45, 5-32. https://doi.org/10.1023/A:1010933404324
  6. Jeong, J. H., J. P. Resop, N. D. Mueller, D. H. Fleisher, K. Yun, E. E. Butler, D. J. Timlin, K. M. Shim, J. S. Gerber, V. R. Reddy, and S. H. Kim,2016: Random forests for global and regional crop yield predictions. PLoS One 11(6):e0156571. doi:10.1371/journal.pone.0156571.
  7. Kim, J., C. K. Lee, J. Shon, K. J. Choi, and Y. Yoon, 2012: Comparison of statistic methods for evaluating crop model performance. Korean Journal of Agricultrual and Forest Meteorology 14(4), 269-276. https://doi.org/10.5532/KJAFM.2012.14.4.269
  8. Knipling, E. B., 1970: Physical and physiological basis for the reflectance of visible and near-infrared radiation from vegetation. Remote Sensing of Environment 1(3), 155-159. https://doi.org/10.1016/S0034-4257(70)80021-9
  9. KOSIS (Korean Statistical Information Service): www.kosis.kr
  10. KMA (Korea Meteorological Administration): www.kma.go.kr/weather/climate/past_table.jsp.
  11. K-indicator: www.index.go.kr/potal/info/idxKoreaView.do?idx_Cd=1287.
  12. Kim, S. H., Y. Yang, T. J. Dennis, D. H. Fleisher, A. Dathe, V. R. Reddy, and K. Staver, 2012: Modeling temperature responses of leaf growth, development, and biomass in maize with MAIZSIM. Agronomy Journal 104(6), 1523-1537. https://doi.org/10.2134/agronj2011.0321
  13. Landau, S., RAC Mitchell, V. Barnett, J. Colls, J. Craigon, R. W. Payne, 2000: A parsimonious, multiple-regression model of wheat yield response to environment. Agricultural and Forest Meteorology 101(2-3), 151-166. https://doi.org/10.1016/S0168-1923(99)00166-5
  14. Lee, C. K., K. S. Kwak, J. H. Kim, J. Y. Son, and W. H. Yang, 2011: Impacts of climate change and follow-up cropping season shift on growing period and temperature in different rice maturity types. Korean Journal of Crop Science 56(3), 233-243. https://doi.org/10.7740/kjcs.2011.56.3.233
  15. Loague, K., and R. E. Green, 1991: Statistical and graphical methods for evaluating solute transport models: Overview and application. Journal of Contaminant Hydrology 7(1), 51-73. https://doi.org/10.1016/0169-7722(91)90038-3
  16. Lobell, D. B., and M. B. Burke, 2010: On the use of statistical models to predict crop yield responses to climate change. Agricultural and Forest Meteorology 150(11), 1443-1452. https://doi.org/10.1016/j.agrformet.2010.07.008
  17. Moran, M. S., T. R. Clarke, Y. Inoue, and A. Vidal, 1994: Estimating crop water deficit using the relation between surface-air temperature and spectral vegetation index. Remote Sensing of Environment 49, 246-263. https://doi.org/10.1016/0034-4257(94)90020-5
  18. Nash, J. E., and J. V. Sutcliffe, 1970: River flow forecasting through conceptual models part I - A discussion of principles. Journal of Hydrology 10(3), 282-290. https://doi.org/10.1016/0022-1694(70)90255-6
  19. Ray, D. K., J. S. Gerber, G. K. MacDonald, and P. C. West, 2015: Climate variation explains a third of global crop yield variability. Nature Communications 6, 5989. doi:http://doi.org/10.1038/ncomms6989.
  20. Tucker, C. J., B. N. Holben, J. H. Elgin, and J. E. McMurtrey., 1981: Remote sensing of total dry-matter accumulation in winter wheat. Remote Sensing of Environment 11, 171-189. https://doi.org/10.1016/0034-4257(81)90018-3
  21. Wardlow, B. D., and S. L. Egbert, 2008: Large-area crop mapping using time-series MODIS 250 m NDVI data: An assessment for the U.S. Central Great Plains. Remote Sensing of Environment 112(3), 1096-1116. https://doi.org/10.1016/j.rse.2007.07.019
  22. Willmott, C. J., 1981: On the validation of models. Physical Geography 2(2), 184-194. https://doi.org/10.1080/02723646.1981.10642213
  23. Yun, Y. H., C. K. Lee, J. Y. Shon, J. H. Kim, K. J. Choi, and J. K. Kim, 2013: Analysis of current climate effect on rice yield in South Korea. Korean Society of Crop Science annual spring meeting 68pp.