DOI QR코드

DOI QR Code

거리척도와 앙상블 기법을 활용한 지가 추정

Estimating Farmland Prices Using Distance Metrics and an Ensemble Technique

  • 이창로 (서울대학교 국토문제연구소) ;
  • 박기호 (서울대학교 지리학과, 국토문제연구소)
  • Lee, Chang-Ro (Institute for Korean Regional Studies, Seoul National University) ;
  • Park, Key-Ho (Department of Geography, Seoul National University, Institute for Korean Regional Studies)
  • 투고 : 2016.08.23
  • 심사 : 2016.12.07
  • 발행 : 2016.12.10

초록

본 연구는 사례 기반 학습(instance-based learning)의 논리를 활용하여 지가를 추정하였다. 다양한 사례 기반 학습 기법 중 k-최근린법을 이용하였으며, k-최근린법 적용시 유사성을 측정하는 거리척도는 유클리디안 거리를 비롯해 문헌에 비교적 자주 등장하는 10개의 거리척도를 사용하였다. 본 연구에서는 k-최근린법에 의한 10 종류의 예측값 중 가장 우수한 성능을 보이는 1개의 예측값을 최종 가격으로 선택하는 대신, 이들 예측값들을 병합하는 앙상블(ensemble) 기법의 논리를 적용하여 최종 예측값을 결정하였다. 앙상블 기법 중 일종의 잔차 적합 모형인 경사 부스팅 앨고리듬을 적용하여 최종 가격을 정하였다. 본 연구에서는 이러한 사례 기반 학습과 앙상블 기법의 이점을 실증적으로 제시하기 위해 전라남도 해남군 소재 농지를 사례로 하여 가격을 추정하였으며, k-최근린법에 의한 10 종류의 예측값보다 앙상블 기법에 의한 가격이 보다 정확한 것을 확인할 수 있었다.

This study estimated land prices using instance-based learning. A k-nearest neighbor method was utilized among various instance-based learning methods, and the 10 distance metrics including Euclidean distance were calculated in k-nearest neighbor estimation. One distance metric prediction which shows the best predictive performance would be normally chosen as final estimate out of 10 distance metric predictions. In contrast to this practice, an ensemble technique which combines multiple predictions to obtain better performance was applied in this study. We applied the gradient boosting algorithm, a sort of residual-fitting model to our data in ensemble combining. Sales price data of farm lands in Haenam-gun, Jeolla Province were used to demonstrate advantages of instance-based learning as well as an ensemble technique. The result showed that the ensemble prediction was more accurate than previous 10 distance metric predictions.

키워드

참고문헌

  1. 김명현, 이세호, 신동훈. 2015. K-Nearest Neighbors (K-NN) 알고리즘을 통한 KOSPI200 선물지수 예측효과 연구. 대한경영학회지. 28(10):2613-2633. Kim MH, Lee SH, Shin DH. 2015. Predictability Test of K-Nearest Neighbors Algorithm: Application to the KOSPI 200 Futures. Korea Business Management Journal. 28(10):2613-2633.
  2. 김희종, 김형도. 2014. 그라디언트 부스팅과 균형 분류 를 이용한 채무 불이행 예측. 한국정보기술학회 논문지. 12(1):155-164. Kim HJ, Kim HD. 2014. Predicting Loan Defaults with Gradient Boosting and Balanced Classification. Journal of Advanced Information Technology and Convergence. 12(1):155-164.
  3. 이석준, 김선옥. 2007. 협업필터링에서 고객의 평가치 를 이용한 선호도 예측의 사전평가에 관한 연구. Asia Pacific Journal of Information Systems. 17(4):187-206. Lee SJ, Kim SO. 2007. Pre-evaluation for Prediction Accuracy by Using the Customer's Ratings in Collaborative Filtering. Asia Pacific Journal of Information Systems. 17(4):187-206.
  4. 장희순, 방경식. 2014. 부동산 용어사전. 부연사. Jang HS, Bang KS. 2014. Real Estate Dictionary. Buyeonsa.
  5. Aha DW, Kibler D, Albert MK. 1991. Instancebased Learning Algorithms. Machine Learning. 6(1):37-66.
  6. Alfaro E, García N, Gámez M, Elizondo D. 2008. Bankruptcy Forecasting: An Empirical Comparison of AdaBoost and Neural Networks. Decision Support Systems. 45(1):110-122. https://doi.org/10.1016/j.dss.2007.12.002
  7. Banfield RE. 2007. A Comparison of Decision Tree Ensemble Creation Techniques, IEEE Transactions on Pattern Analysis and Machine Intelligence. 29(1):173-180. https://doi.org/10.1109/TPAMI.2007.250609
  8. Chopra S, Hadsell R, LeCun Y. 2005. Learning a Similarity Metric Discriminatively, with Application to Face Verification. In: Computer Vision and Pattern Recognition. Proceedings of a Conference Held by IEEE Computer Society; 2005 Jun 20; San Diego (CA); 2005. Vol. 1. p. 539-546.
  9. Fanelli G , D antone M , G all J , F ossati A, G ool L . 2013. Random Forests for Real Time 3D Face Analysis. International Journal of Computer Vision. 101(3):437-458. https://doi.org/10.1007/s11263-012-0549-0
  10. Friedman JH. 2001. Greedy Function Approximation: a Gradient Boosting Machine. Annals of Statistics. 29(5): 1189-1232. https://doi.org/10.1214/aos/1013203451
  11. Gama J, Camacho R, Brazdil P, Jorge A, Torgo L. 2005. Machine Learning: ECML 2005. Proceedings of a symposium held at the 16th European Conference on Machine Learning; 2005 Oct 3-7; Porto, Portugal; 2005. p. 601-608.
  12. Kuhn M, Johnson K. 2013. Applied Predictive Modeling. New York: Springer, p. 389-400.
  13. Legendre P, LF Legendre. 2012. Numerical Ecology. London: Elsevier, p. 296-298.
  14. Lemmens A, Croux C. 2006. Bagging and Boosting Classification Trees to Predict Churn. Journal of Marketing Research. 43(2):276-286. https://doi.org/10.1509/jmkr.43.2.276
  15. Li P, Wu Q, Burges CJ. 2007. Mcrank: Learning to Rank Using Multiple Classification and Gradient Boosting. In: Proceedings of a symposium held at the 21st Annual Conference on Neural Information Processing Systems; 2007 Dec 3-5; Vancouver (BC); 2007. p. 897-904.
  16. Liao Y, Vemuri VR. 2002. Use of K-Nearest Neighbor Classifier for Intrusion Detection. Computers & Security. 21(5):439-448. https://doi.org/10.1016/S0167-4048(02)00514-X
  17. Park B, Bae JK. 2015. Using Machine Learning Algorithms for Housing Price Prediction: The Case of Fairfax County, Virginia Housing Data. Expert Systems with Applications. 42(6):2928-2934. https://doi.org/10.1016/j.eswa.2014.11.040
  18. Quinlan JR. 1993. Combining Instance-based and Model-based Learning. In: Proceedings of a symposium held at the 10th International Conference on Machine Learning; 1993 Jun 27-29; Amherst (MA); 1993. p. 236-243.
  19. Rasyidi MA, Kim J, Ryu KR. 2014. Short-Term Prediction of Vehicle Speed on Main City Roads Using the K-Nearest Neighbor Algorithm. Journal of Intelligence and Information Systems. 20(1):121-131.
  20. Schapire RE. 1999. Theoretical Views of Boosting. In: Proceedings of a symposium held at the 4th European Conference, Euro COLT on Computational Learning Theory; 1999 Mar 29-31; Nordkirchen, Germany; 1999. p. 1-10.
  21. Shen H, Chou KC. 2005. Using Optimized Evidence-Theoretic K-Nearest Neighbor Classifier and Pseudo-amino Acid Composition to Predict Membrane Protein Types. Biochemical and Biophysical Research Communications. 334(1):288-292. https://doi.org/10.1016/j.bbrc.2005.06.087
  22. Wang YQ. 2008. Building Credit Scoring Systems Based on Support-based Support Vector Machine Ensemble. In: Proceedings of a symposium held at the 4th International Conference on Natural Computation; 2008 Oct 18-20; Jinan, China; 2008. p. 323-326.
  23. Weinberger KQ, Blitzer J, Saul LK. 2009. Distance Metric Learning for Large Margin Nearest Neighbor Classification. Journal of Machine Learning Research. 10: 207-244.