DOI QR코드

DOI QR Code

Classifying Sub-Categories of Apartment Defect Repair Tasks: A Machine Learning Approach

아파트 하자 보수 시설공사 세부공종 머신러닝 분류 시스템에 관한 연구

  • 김은혜 (성균관대학교 데이터사이언스융합학과) ;
  • 지홍근 (성균관대학교 인공지능융합학과) ;
  • 김지나 (성균관대학교 인터랙션사이언스) ;
  • 박은일 (성균관대학교 인공지능융합학과) ;
  • 엄재용 (한국과학기술원 기술경영전문대학원)
  • Received : 2021.02.02
  • Accepted : 2021.05.01
  • Published : 2021.09.30

Abstract

A number of construction companies in Korea invest considerable human and financial resources to construct a system for managing apartment defect data and for categorizing repair tasks. Thus, this study proposes machine learning models to automatically classify defect complaint text-data into one of the sub categories of 'finishing work' (i.e., one of the defect repair tasks). In the proposed models, we employed two word representation methods (Bag-of-words, Term Frequency-Inverse Document Frequency (TF-IDF)) and two machine learning classifiers (Support Vector Machine, Random Forest). In particular, we conducted both binary- and multi- classification tasks to classify 9 sub categories of finishing work: home appliance installation work, paperwork, painting work, plastering work, interior masonry work, plaster finishing work, indoor furniture installation work, kitchen facility installation work, and tiling work. The machine learning classifiers using the TF-IDF representation method and Random Forest classification achieved more than 90% accuracy, precision, recall, and F1 score. We shed light on the possibility of constructing automated defect classification systems based on the proposed machine learning models.

대한민국 건설사들은 아파트 하자 정보를 축적하고 보수작업을 관리하기 위한 시스템을 운영하는데 상당한 인력과 비용을 투자하고 있다. 본 연구에서는 하자 접수 상세내용 텍스트 데이터를 이용하여 하자 보수 시설공사에 따른 세부공종을 분류하는 머신러닝 모델을 제안한다. 두 가지 단어 임베딩(Bag-of-words, Term Frequency-Inverse Document Frequency (TF-IDF))과 두 가지 분류기(Support Vector Machine, Random Forest)를 통해 한국어로 작성된 65만건 이상의 하자 접수데이터로부터 하자보수 시설공사 세부공종을 분류했다. 특히, 이번 연구에서는 특정 시설공사(마감공사)의 9개 세부공종(가전제품, 도배공사, 도장공사, 미장공사, 석공사, 수장공사, 옥내가구공사, 주방기구공사, 타일공사)을 분류하는 이진분류 모델과 다중 분류 모델을 연구했다. 그 결과, TF-IDF와 Random Forest를 사용한 두가지 분류 모델에서 90%이상의 정확도, 정밀도, 재현율 및 F1점수를 확인했다.

Keywords

Acknowledgement

이 논문은 과학기술정보통신부/정보통신기획평가원의 ICT혁신인재4.0 사업(IITP-2020-0-01816)과 2019년도 정부(교육부)의 재원으로 한국연구재단의 지원을 받아 수행된 기초연구사업임(No. NRF-2019R1I1A2A01058640).

References

  1. Kim, Daenyeon, Housing Survey Statistical Report (2019) [Internet], http://stat.molit.go.kr/portal/cate/statFileView.do?hRsId=327&hFormId=
  2. D. S. Watt, "Building pathology: Principles and practice," John Wiley & Sons, 2009.
  3. Housing Construction Supply Division, Apartment Defect Dispute Mediation Committee [Internet] http://www.adc.go.kr
  4. Jin, Dongyeong, Apartment defect application, 62 times' explosion' in 10 years [Internet], https://www.sedaily.com/NewsView/1Z8YOBPOY1.
  5. The Housing Policy Division. Housing act [Internet], https://www.law.go.kr/LSW/eng/engLsSc.do?menuId=2§ion=lawNm&query=16006.
  6. S. H. Lee, S. H. Lee, and J. J. Kim, "Evaluating the impact of defect risks in residential buildings at the occupancy phase," Sustainability, Vol.10, No.12, pp.4466, 2018. https://doi.org/10.3390/su10124466
  7. B. Kim, Y. H. Ahn, and S. H. Lee, "LDA-based model for defect management in residential buildings," Sustainability, Vol.11, No.24, pp.7201, 2019. https://doi.org/10.3390/su11247201
  8. S. Y. Park, Y. H. Ahn, and S. H. Lee, "Analyzing the finishing works service life pattern of public housing in South Korea by probabilistic approach," Sustainability, Vol.10, No.12, pp.4469, 2018. https://doi.org/10.3390/su10124469
  9. T. Joachims, "A probabilistic analysis of the rocchio algorithm with TFIDF for text categorization," Carnegie-mellon Univ Pittsburgh Pa Dept of Computer Science, 1996.
  10. F. Pedregosa, et al., "Scikit-learn: Machine learning in Python," The Journal of Machine Learning Research, Vol.12, pp.2825-2830, 2011.
  11. Glemaitre. Imbalanced-Learn [Internet], https://github.com/scikit-learn-contrib/imbalanced-learn/tree/master/imblearn.
  12. W. McKinney, "Data structures for statistical computing in python," Proceedings of the 9th Python in Science Conference, Vol.445, 2010.