Research on ITB Contract Terms Classification Model for Risk Management in EPC Projects: Deep Learning-Based PLM Ensemble Techniques

Hyunsang Lee;Wonseok Lee;Bogeun Jo;Heejun Lee;Sangjin Oh;Sangwoo You;Maru Nam;Hyunsik Lee;

doi:10.3745/KTSDE.2023.12.11.471

KIPS Transactions on Software and Data Engineering (정보처리학회논문지:소프트웨어 및 데이터공학)

Volume 12 Issue 11
/
Pages.471-480
/
2023
/
2287-5905(pISSN)
/
2734-0503(eISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

Research on ITB Contract Terms Classification Model for Risk Management in EPC Projects: Deep Learning-Based PLM Ensemble Techniques

EPC 프로젝트의 위험 관리를 위한 ITB 문서 조항 분류 모델 연구: 딥러닝 기반 PLM 앙상블 기법 활용

이현상 ((주)빅웨이브에이아이 데이터 분석팀) ;
이원석 ((주)빅웨이브에이아이) ;
조보근 ((주)빅웨이브에이아이) ;
이희준 ((주)빅웨이브에이아이) ;
오상진 (현대엔지니어링(주) 스마트ICT팀) ;
유상우 (현대엔지니어링(주) 스마트ICT팀) ;
남마루 (현대엔지니어링(주) 해외법무팀) ;
이현식 (현대엔지니어링(주) 스마트ICT팀)

Received : 2023.08.02
Accepted : 2023.09.04
Published : 2023.11.30

https://doi.org/10.3745/KTSDE.2023.12.11.471 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

The Korean construction order volume in South Korea grew significantly from 91.3 trillion won in public orders in 2013 to a total of 212 trillion won in 2021, particularly in the private sector. As the size of the domestic and overseas markets grew, the scale and complexity of EPC (Engineering, Procurement, Construction) projects increased, and risk management of project management and ITB (Invitation to Bid) documents became a critical issue. The time granted to actual construction companies in the bidding process following the EPC project award is not only limited, but also extremely challenging to review all the risk terms in the ITB document due to manpower and cost issues. Previous research attempted to categorize the risk terms in EPC contract documents and detect them based on AI, but there were limitations to practical use due to problems related to data, such as the limit of labeled data utilization and class imbalance. Therefore, this study aims to develop an AI model that can categorize the contract terms based on the FIDIC Yellow 2017(Federation Internationale Des Ingenieurs-Conseils Contract terms) standard in detail, rather than defining and classifying risk terms like previous research. A multi-text classification function is necessary because the contract terms that need to be reviewed in detail may vary depending on the scale and type of the project. To enhance the performance of the multi-text classification model, we developed the ELECTRA PLM (Pre-trained Language Model) capable of efficiently learning the context of text data from the pre-training stage, and conducted a four-step experiment to validate the performance of the model. As a result, the ensemble version of the self-developed ITB-ELECTRA model and Legal-BERT achieved the best performance with a weighted average F1-Score of 76% in the classification of 57 contract terms.

국내 건설수주 규모는 2013년 91.3조원에서 2021년 총 212조원으로 특히 민간부문에서 크게 성장하였다. 국내외 시장 규모가 성장하면서, EPC(Engineering, Procurement, Construction) 프로젝트의 규모와 복잡성이 더욱 증가되고, 이에 프로젝트 관리 및 ITB(Invitation to Bid) 문서의 위험 관리가 중요한 이슈가 되고 있다. EPC 프로젝트 발주 이후 입찰 절차에서 실제 건설 회사에게 부여되는 대응 시간은 한정적일 뿐만 아니라, 인력 및 비용의 문제로 ITB 문서 계약 조항의 모든 리스크를 검토하는데 매우 어려움이 있다. 기존 연구에서는 이와 같은 문제를 해결하고자 EPC 계약 문서의 위험 조항을 범주화하고, 이를 AI 기반으로 탐지하려는 시도가 있었으나, 이는 레이블링 데이터 활용의 한계와 클래스 불균형과 같은 데이터 측면의 문제로 실무에서 활용할 수 있는 수준의 지원 시스템으로 활용하기 어려운 상황이다. 따라서 본 연구는 기존 연구와 같이 위험 조항 자체를 정의하고 분류하는 것이 아니라, FIDIC Yellow 2017(국제 컨설팅엔지니어링 연맹 표준 계약 조건) 기준 계약 조항을 세부적으로 분류할 수 있는 AI 모델을 개발하고자 한다. 프로젝트의 규모, 유형에 따라서 세부적으로 검토해야 하는 계약 조항이 다를 수 있기 때문에 이와 같은 다중 텍스트 분류 기능이 필요하다. 본 연구는 다중 텍스트 분류 모델의 성능 고도화를 위해서 최근 텍스트 데이터의 컨텍스트를 효율적으로 학습할 수 있는 ELECTRA PLM(Pre-trained Language Model)을 사전학습 단계부터 개발하고, 해당 모델의 성능을 검증하기 위해서 총 4단계 실험을 진행했다. 실험 결과, 자체 개발한 ITB-ELECTRA 모델 및 Legal-BERT의 앙상블 버전이 57개 계약 조항 분류에서 가중 평균 F1-Score 기준 76%로 가장 우수한 성능을 달성했다.

Keywords

References

K. Kabirifar and M. Mojtahedi, "The impact of engineering, procurement and construction (EPC) phases on project performance: A case of large-scale residential construction project," Buildings, Vol.9, No.1, pp.15, 2019.
R. Joslin and R. Muller, "The relationship between project governance and project success," International Journal of Project Management, Vol.34, No.4, pp.613-626, 2016. https://doi.org/10.1016/j.ijproman.2016.01.008
C. H. Park, "2023 Construction Industry Outlook Report," Construction & Economy Research Institute of Korea (CERIK).
A. F. Bakr, K. El. Hagla. and A. N. A. Rawash, "Heuristic approach for risk assessment modeling: EPCCM Application (Engineer Procure Construct Contract Management)," Alexandria Engineering Journal, Vol.51, No.4, pp.305-323, 2012. https://doi.org/10.1016/j.aej.2012.09.001
Y. Kim, J. Lee, E.-B. Lee, and J.-H. Lee, "Application of Natural Language Processing (Nlp) and Text-Mining of Big-Data to Engineering-Procurement-Construction (Epc) Bid and Contract Documents," Paper Presented at the 2020 6th Conference on Data Science and Machine Learning Applications (CDMA), 2020.
T. Brown et al., "Language models are few-shot learners," arXiv preprint arXiv:2005.14165, 2020.
ChatGPT [Internet], https://openai.com/blog/chatgpt, 2022.
BingChat [Internet], https://www.microsoft.com/en-us/edge/features/bing-chat?form=MT00D8, 2023.
I. Chalkidis, M. Fergadiotis, P. Malakasiotis, N. Aletras, and I. Androutsopoulos, "Legal-Bert: The muppets straight out of law school," arXiv preprint arXiv:2010.02559, 2020.
S. Moon, Y. Shin, B.-G. Hwang, and S. Chi, "Document management system using text mining for information acquisition of international construction," KSCE Journal of Civil Engineering, Vol.22, pp.4791-4798, 2018. https://doi.org/10.1007/s12205-018-1528-y
S. J. Choi. S. W. Choi. J. H. Kim. and E.-B. Lee, "Ai and Text-Mining Applications for Analyzing Contractor's Risk in Invitation to Bid (Itb) and Contracts for Engineering Procurement and Construction (Epc) Projects," Energies, Vol.14, No.15, pp.4632, 2021.
S.-W. Choi and E.-B. Lee, "Contractor's risk analysis of engineering procurement and construction (Epc) contracts using ontological semantic model and bi-long short-term memory (Lstm) technology," Sustainability, Vol.14, No.11, pp.6938, 2022.
L. Zhang, S. Wang, and B. Liu, "Deep learning for sentiment analysis: A survey," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, Vol.8, No.4, pp.1253, 2018.
Q. Yaseen, "Spam email detection using deep learning techniques," Procedia Computer Science, Vol.184, pp.853-58, 2021. https://doi.org/10.1016/j.procs.2021.03.107
N. Djuric, J. Zhou, R. Morris, M. Grbovic, V. Radosavljevic, and N. Bhamidipati, "Hate speech detection with comment embeddings," Paper Presented at the Proceedings of the 24th International Conference on World Wide Web, 2015.
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.
K. Clark, M.-T. Luong, Q. V. Le, and C. D. Manning, "Electra: Pre-training text encoders as discriminators rather than generators," arXiv preprint arXiv:2003.10555, 2020.
Huggingface, "Helsinki-NLP", https://huggingface.co/ Helsinki-NLP, Language Technology Research Group at the University of Helsinki, 2021.
C. Shorten, T. M. Khoshgoftaar, and B. Furht, "Text data augmentation for deep learning," Journal of big Data, Vol.8, pp.1-34, 2021. https://doi.org/10.1186/s40537-020-00387-6
D. Hendrycks and K. Gimpel, "Gaussian Error Linear Units (Gelus)," arXiv preprint arXiv:1606.08415, 2016.
Contract Understanding Atticus Dataset (CUAD) [Internet], https://www.atticusprojectai.org/cuad, 2021.
ContractNLI, A Dataset for Document-level Natural Language Inference for Contracts [Internet], https://stanfordnlp.gith ub.io/contract-nli/, 2021.
American National Corpus [Internet], https://anc.org/, 2008.
News Articles Corpus, Kaggle [Internet], https://www.kaggle.com/datasets/sbhatti/news-articles-corpus, 2022.
Wikipedia Corpus [Internet], https://www.english-corpora.org/wiki/, 2022.
Plant and Design-Build Contract 2nd Ed (2017 Yellow Book), Federation Internationale Des Ingenieurs-Conseils, 2017.
Github, "Google Research ELECTRA" [Internet], https://github.com/google-research/electra, 2021.
Huggingface, "legal-bert-base-uncased" [Internet], https://huggingface.co/nlpaueb/legal-bert-base-uncased, 2020.
Z. H. Kilimci and S. Akyoku, "Deep learning-and word embedding-based heterogeneous classifier ensembles for text classification," Complexity, 2018.
A. Mohammed and R. Kora, "An effective ensemble deep learning framework for text classification," Journal of King Saud University-Computer and Information Sciences, Vol.34, No.10, pp.8825-8837, 2022. https://doi.org/10.1016/j.jksuci.2021.11.001