DOI QR코드

DOI QR Code

Apartment Price Prediction Using Deep Learning and Machine Learning

딥러닝과 머신러닝을 이용한 아파트 실거래가 예측

  • 김학현 (성균관대학교 전자전기공학부) ;
  • 유환규 (한양대학교 기계공학과) ;
  • 오하영 (성균관대학교 글로벌융합학부)
  • Received : 2022.04.08
  • Accepted : 2022.10.13
  • Published : 2023.02.28

Abstract

Since the COVID-19 era, the rise in apartment prices has been unconventional. In this uncertain real estate market, price prediction research is very important. In this paper, a model is created to predict the actual transaction price of future apartments after building a vast data set of 870,000 from 2015 to 2020 through data collection and crawling on various real estate sites and collecting as many variables as possible. This study first solved the multicollinearity problem by removing and combining variables. After that, a total of five variable selection algorithms were used to extract meaningful independent variables, such as Forward Selection, Backward Elimination, Stepwise Selection, L1 Regulation, and Principal Component Analysis(PCA). In addition, a total of four machine learning and deep learning algorithms were used for deep neural network(DNN), XGBoost, CatBoost, and Linear Regression to learn the model after hyperparameter optimization and compare predictive power between models. In the additional experiment, the experiment was conducted while changing the number of nodes and layers of the DNN to find the most appropriate number of nodes and layers. In conclusion, as a model with the best performance, the actual transaction price of apartments in 2021 was predicted and compared with the actual data in 2021. Through this, I am confident that machine learning and deep learning will help investors make the right decisions when purchasing homes in various economic situations.

코로나 시대 이후 아파트 가격 상승은 비상식적이었다. 이러한 불확실한 부동산 시장에서 가격 예측 연구는 매우 중요하다. 본 논문에서는 다양한 부동산 사이트에서 자료 수집 및 크롤링을 통해 2015년부터 2020년까지 87만개의 방대한 데이터셋을 구축하고 다양한 아파트 정보와 경제지표 등 가능한 많은 변수를 모은 뒤 미래 아파트 매매실거래가격을 예측하는 모델을 만든다. 해당 연구는 먼저 다중 공선성 문제를 변수 제거 및 결합으로 해결하였다. 이후 의미있는 독립변수들을 뽑아내는 전진선택법(Forward Selection), 후진소거법(Backward Elimination), 단계적선택법(Stepwise Selection), L1 Regularization, 주성분분석(PCA) 총 5개의 변수 선택 알고리즘을 사용했다. 또한 심층신경망(DNN), XGBoost, CatBoost, Linear Regression 총 4개의 머신러닝 및 딥러닝 알고리즘을 이용해 하이퍼파라미터 최적화 후 모델을 학습시키고 모형간 예측력을 비교하였다. 추가 실험에서는 DNN의 node와 layer 수를 바꿔가면서 실험을 진행하여 가장 적절한 node와 layer 수를 찾고자 하였다. 결론적으로 가장 성능이 우수한 모델로 2021년의 아파트 매매실거래가격을 예측한 후 실제 2021년 데이터와 비교한 결과 훌륭한 성과를 보였다. 이를 통해 머신러닝과 딥러닝은 다양한 경제 상황 속에서 투자자들이 주택을 구매할 때 올바른 판단을 할 수 있도록 도움을 줄 수 있을 것이라 확신한다.

Keywords

Acknowledgement

This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (No. NRF-2022R1F1A1074696).

References

  1. S.Nam, T. Han, L. Kim, and E. Lee, "Prediction of real estate price fluctuations in Korea using machine learning techniques," The Journal of The Institute of Internet, Broadcasting and Communication, Vol.20, No.6, pp.15-20, 2020. DOI: 10.7236/JIIBC.2020.20.6.15
  2. Y. Hwang, "A study on the calculation of the apartment price index: Focusing on machine learning algorithms," Financial Research, Vol.33, No.3, pp.51-83, 2019. DOI: 10.21023/JMF.33.3.3
  3. L. Whieldon and H. Ashqar, "Predicting residential property value in catonsville, maryland: A comparison of multiple regression techniques," General Economics (econ.GN), 2020. DOI: 10.48550/arixiv.2101.01531
  4. S. B. Jha, R. F. Babiceanu, V. Pandey, and R. K. Jha, "Housing market prediction problem using different machine learning algorithms: A case study," Machine Learning (cs.LG), 2020.
  5. S. Bae and J. Yu, "Real estate price index prediction using deep learning," Real Estate Research, Vol.27, No.3, pp.71-86, 2017.
  6. S. Bae and J. Yu, "Real estate price index prediction using machine learning method and time series analysis model," Housing Research, Vol.26, No.1, pp.107-133, 2018. DOI: 10.24957/hsr.2018.26.107
  7. S. Bae and J. Yu, "Estimation of apartment house price using machine learning: Gangnam-gu, Seoul as an example," Real Estate Research, Vol.24, No.1, pp.69-85, 2018. https://doi.org/10.19172/KREAA.24.1.5
  8. T. Lee and M. Jeon, "A study on the prediction of the seoul housing price index using a deep learning model," Housing and Urban Research, Vol.8, No.2, pp.39-56, 2018. DOI: 10.26700/shuri.2018.08.2.39
  9. H. Chun, H. Yang, "A study on house price prediction using deep learning," Housing and Urban Research, Vol.8, No.2, pp.37-49, 2018. DOI :10.22313/reik.2019.17.2.37
  10. S. Lim, "Comparative study on the housing price index prediction model," Journal of the Korean Data and Information Science Society, Vol.25, No.1, pp.65-76, 2014. DOI: 10.7465/jk야.2014.25.1.65
  11. L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, "CatBoost: Unbiased boosting with categorical features," Machine Learning (cs.LG), 2017. DOI: 10.48550/arxiv.1706.09516
  12. G. Ke et al., "LightGBM: A highly efficient gradient boosting decision tree," Advances in Neural Information Processing Systems, 30, 2017. DOI: 10.5555/3294996.3295074
  13. T. Chen and C. Guestrin, "XGBoost: A scalable tree boosting system," Machine Learning (cs.LG), 2016, DOI: 10.48550/arxiv.1603.02754
  14. S. Kapoor and V. Perrone, "A simple and fast baseline for tuning large XGBoost models," Machine Learning (cs.LG), 2021. DOI: 10.4855-/arxiv.2111.06924
  15. P. Liashchynskyi and P. Liashchynskyi, "Grid search, random search, genetic algorithm: A big comparison for NAS," Machine Learning (cs.LG), 2019, DOI : 10.48550/arxiv.1912.0659
  16. J. Shlens, "A tutorial on principal component analysis," Machine Learning (cs.LG), 2014, DOI: 10.48550/arxiv.1404.1100
  17. B. H. Park and J. K. Bae, "Using machine learning algorithms for housing price prediction: The case of Fairfax County, Virginia housing data," Expert Systems with Applications, Vol.42, No.6, pp.2928-2934, 2015. DOI: 10.10 16/j.eswa.2014.11.040 https://doi.org/10.1016/j.eswa.2014.11.040
  18. A. B. Khamis, and N. K. Kamarudin, "Comparative study on estimate house price using statistical and neural network model," International Journal of Scientific & Technology Research, Vol.3, No.12, pp.126-131, 2014.
  19. A. Gulli and S. Pal, "Deep learning with keras," Packt Publishing, Birmingham - Mumbai, 2017.
  20. A. S. Fotheringham, R. Crespo, and J. Yao, "Exploring, modelling and predicting spatiotemporal variations in house prices," The Annals of Regional Science, Vol.54, pp.417-436, 2015. DOI: 10.1007/s00168-015-0660-6