DOI QR코드

DOI QR Code

A Study on the Construction of Financial-Specific Language Model Applicable to the Financial Institutions

금융권에 적용 가능한 금융특화언어모델 구축방안에 관한 연구

  • Jae Kwon Bae
  • 배재권 (계명대학교 경영정보학과)
  • Received : 2024.06.04
  • Accepted : 2024.06.13
  • Published : 2024.06.30

Abstract

Recently, the importance of pre-trained language models (PLM) has been emphasized for natural language processing (NLP) such as text classification, sentiment analysis, and question answering. Korean PLM shows high performance in NLP in general-purpose domains, but is weak in domains such as finance, medicine, and law. The main goal of this study is to propose a language model learning process and method to build a financial-specific language model that shows good performance not only in the financial domain but also in general-purpose domains. The five steps of the financial-specific language model are (1) financial data collection and preprocessing, (2) selection of model architecture such as PLM or foundation model, (3) domain data learning and instruction tuning, (4) model verification and evaluation, and (5) model deployment and utilization. Through this, a method for constructing pre-learning data that takes advantage of the characteristics of the financial domain and an efficient LLM training method, adaptive learning and instruction tuning techniques, were presented.

최근 텍스트분류, 감성분석, 질의응답 등의 자연어 처리를 위해서 사전학습언어모델(Pre-trained Language Model, PLM)의 중요성은 날로 강조되고 있다. 한국어 PLM은 범용적인 도메인의 자연어 처리에서 높은 성능을 보이나 금융, 제조, 법률, 의료 등의 특화된 도메인에서는 성능이 미약하다. 본 연구는 금융도메인 뿐만 아니라 범용도메인에서도 우수한 성능을 보이는 금융특화 언어모델의 구축을 위해 언어모델의 학습과정과 미세조정 방법을 제안하는 것이 주요 목표이다. 금융도메인 특화언어모델을 구축하는 과정은 (1) 금융데이터 수집 및 전처리, (2) PLM 또는 파운데이션 모델 등 모델 아키텍처 선정, (3) 도메인 데이터 학습과 인스트럭션 튜닝, (4) 모델 검증 및 평가, (5) 모델 배포 및 활용 등으로 구성된다. 이를 통해 금융도메인의 특성을 살린 사전학습 데이터 구축방안과 효율적인 LLM 훈련방법인 적응학습과 인스트럭션 튜닝기법을 제안하였다.

Keywords

References

  1. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G. and Askell, A. (2020). Language Models are Few-shot Learners, Advances in Neural Information Processing Systems, 33(1), 1877-1901.
  2. Chang, Y., Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K. and Xie, X. (2024). A Survey on Evaluation of Large Language Models, ACM Transactions on Intelligent Systems and Technology, 15(3), 1-45.
  3. Chen, Z., Mao, H., Li, H., Jin, W., Wen, H., Wei, X. and Tang, J. (2024). Exploring the Potential of Large Language Models in Learning on Graphs, ACM SIGKDD Explorations Newsletter, 25(2), 42-61.
  4. Han, M. A., Kim, Y. H. and Kim, N. G. (2022). The Effect of Domain Specificity on the Performance of Domain-specific Pre-trained Language Models, Journal of Intelligence and Information Systems, 28(4), 251-273.
  5. Han, N. E., Seo, S. and Um, J. H. (2023). A Proposal of Evaluation of Large Language Models Built Based on Research Data, Journal of the Korean Society for Information Management, 40(3), 77-98.
  6. Han, X., Zhang, Z. Ding, N., Gu, Y., Liu, X., Huo, Y., Qiu, J., Yao, Y., Zhang, A. and Zhang, L. (2021). Pre-trained Models: Past, Present and Future, AI Open, 2(1), 225-250.
  7. Heng, J., Teo, D. B. and Tan, L. F. (2023). The Impact of Chat Generative Pre-trained Transformer(ChatGPT) on Medical Education, Postgraduate Medical Journal, 99(1176), 1125-1127.
  8. Heo, H. D., Kang, D. G., Kim, Y. S. and Chun, S. H. (2024). A Study on the Intelligent Document Processing Platform for Document Data Informatization, The Journal of The Institute of Internet, Broadcasting and Communication, 24(1), 89-95.
  9. Jung, J. K., Choi, S. K. and Kwon, H. C. (2023). Combining sLLM and Re-ranking Strategies for an Efficient GEC Model, Korean Institute of Information Scientists and Eng, 2023(12), 362-364.
  10. Kim, A. Muhn, M. and Nikolaev, V. (2023). Bloated Disclosures: Can ChatGPT Help Investors Process Financial Information?, General Economics, 13, 1-20.
  11. Kim, J. S. (2023). A Study on Fine-Tuning and Transfer Learning to Create a Sentiment Binary Classification Model in Korean Text, Journal of Korea Society of Industrial Information Systems, 27(5), 1-11.
  12. Kim, S., Shin, J., Yun, H. G., Lee, J., Choi, J. and Han, J. (2023). Technology Trends of Large Language Models in the Age of Generative AI, Korean Institute of Information Scientists and Engineer, 41(11), 25-33.
  13. Lee, C. H., Lee, Y. J. and Lee, D. H. (2020). A Study of Fine Tuning Pre-trained Korean BERT for Question Answering Performance Development, Journal of Information Technology Services, 19(5), 83-91, 2020.
  14. Lee, H. (2023). Innovations and Risk Factors of Generative AI in the Financial Industry, Global Financial Review, 4(1), 91-121.
  15. Nah, F. H., Zheng, F., Cai, R., Siau, J. K. and Chen, L. (2022). Generative AI and ChatGPT: Applications, Challenges, and AI-Human Collaboration, Journal of Information Technology Case and Application Research, 25(3), 277-304.
  16. Park, S. and Kang, J. (2023). Analysis of Prompt Engineering Methodologies and Research Status to Improve Inference Capability of ChatGPT and Other Large Language Models, Journal of Intelligence and Information Systems, 29(4), 287-308.
  17. Park, N. and Lee, M. (2023). Empowering Emotion Classification Performance through Reasoning Dataset from Large-scale Language Model, Korean Computer and Information Society Academic Conference Papers, 31(2), 59-61.
  18. Seo, B., Lee, Y. H. and Cho, H. (2022). Creation and Use of the News Sentiment Index (NSI) using Machine Learning, National Accounts Review, 1, 1-15.
  19. Noh, S. (2024). Development of Large-Scale Language Models and Ways to Utilize Them in Financial Information Analysis, Capital Market Research Institute Issue Report, 19, 1-24.
  20. Yu, Y. and Kim, H. (2023). Development of a Regulatory Q&A System for KAERI Utilizing Document Search Algorithms and Large Language Model, Journal of Korea Society of Industrial Information Systems, 28(5), 31-39.