DOI QR코드

DOI QR Code

Analyzing Project Similarity in Korean Bidding Documents Using BERT

  • Inwoo Jung (Department of Future & Smart Construction Research, Korea Institute of Civil Engineering and Building Technology) ;
  • Hyunseok Moon (Department of Future & Smart Construction Research, Korea Institute of Civil Engineering and Building Technology) ;
  • Jeongsoo Kim (Department of Future & Smart Construction Research, Korea Institute of Civil Engineering and Building Technology)
  • Published : 2024.07.29

Abstract

In bidding documents, valuable information about the project exists in the form of text. When undertaking a new bid, it is necessary to refer to relevant documents from previous bidding projects, similar to the current one, to effectively understand the requirements and characteristics of the project. However, manually comparing and analyzing these documents is a time-consuming and costly process. Especially with the incorporation of emerging technologies like BIM, comparing and analyzing documents involving these new technologies requires a deeper level of expertise and understanding, posing a significant challenge. To tackle this knowledge gap, this study aims to develop a BERT-based approach to assess project similarity for Korean bidding documents. To achieve the research goal, a two-stage strategy was adopted: 1) the development of a Korean tokenizer for bidding documents in BIM technology, and 2) word embedding using BERT and project similarity analysis employing cosine similarity. The developed BERT-based similarity analysis model can automatically evaluate each project and identify the most similar project. By matching target projects with the best benchmarks, this research can assist individuals in making more accurate and timely decisions.

Keywords

References

  1. R. Shrestha, T. Ko, and J. Lee, "Uncertainties Prevailing in Construction Bid Documents and Their Impact on Project Pricing through the Analysis of Prebid Requests for Information", Journal of Management in Engineering, 39(6), pp. 04023040, 2023. 
  2. A. Pal and S.-H. Hsieh, "Deep-learning-based visual data analytics for smart construction management", Automation in Construction, 131, pp. 103892, 2021.
  3. C. Wu, X. Li, Y. Guo, J. Wang, Z. Ren, M. Wang, and Z. Yang, "Natural language processing for smart construction: Current status and future directions", Automation in Construction, 134, pp. 104059, 2022. 
  4. N. Torkanfar and E.R. Azar, "Quantitative similarity assessment of construction projects using WBS-based metrics", Advanced Engineering Informatics, 46, pp. 101179, 2020. 
  5. Y. Zou, A. Kiviniemi, and S.W. Jones, "Retrieving similar cases for construction project risk management using Natural Language Processing techniques", Automation in construction, 80, pp. 66-76, 2017. 
  6. T. Ko, H. David Jeong, and J. Lee, "Natural language processing-driven similar project determination using project scope statements", Journal of Management in Engineering, 39(3), pp. 04023005, 2023. 
  7. A. Erfani, Q. Cui, and I. Cavanaugh, "An empirical analysis of risk similarity among major transportation projects using natural language processing", Journal of Construction Engineering and Management, 147(12), pp. 04021175, 2021.
  8. A.W. Qurashi, V. Holmes, and A.P. Johnson. "Document processing: Methods for semantic text similarity analysis". in 2020 International Conference on INnovations in Intelligent SysTems and Applications (INISTA), 2020. 
  9. M. Vierlboeck, D. Dunbar, and R. Nilchiani. "Natural Language Processing to Extract Contextual Structure from Requirements". in 2022 IEEE International Systems Conference (SysCon), 2022. 
  10. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding", arXiv preprint arXiv:1810.04805, 2018. 
  11. M. Ryu, G. Lee, and K. Lee, "Knowledge distillation for bert unsupervised domain adaptation", Knowledge and Information Systems, 64(11), pp. 3113-3128, 2022. 
  12. S. Lee, H. Jang, Y. Baik, S. Park, and H. Shin, "Kr-bert: A small-scale korean-specific language model", arXiv preprint arXiv:2008.03979, 2020. 
  13. T. Pires, E. Schlinger, and D. Garrette, "How multilingual is multilingual BERT?", arXiv preprint arXiv:1906.01502, 2019. 
  14. C. Toraman, E.H. Yilmaz, F. Sahinuc, and O. Ozcelik, "Impact of tokenization on language models: An analysis for turkish", ACM Transactions on Asian and Low-Resource Language Information Processing, 22(4), pp. 1-21, 2023. 
  15. W. Chang, Z. Xu, S. Zhou, and W. Cao, "Research on detection methods based on Doc2vec abnormal comments", Future Generation Computer Systems, 86, pp. 656-662, 2018.