Mitigating Data Imbalance in Credit Prediction using the Diffusion Model

Sangmin Oh;Juhong Lee;

doi:10.30693/SMJ.2024.13.02.9

Smart Media Journal (스마트미디어저널)

Volume 13 Issue 2
/
Pages.9-15
/
2024
/
2287-1322(pISSN)
/
2288-9671(eISSN)

THE KOREAN INSTITUTE OF SMART MEDIA (한국스마트미디어학회)

DOI QR Code

Mitigating Data Imbalance in Credit Prediction using the Diffusion Model

Diffusion Model을 활용한 신용 예측 데이터 불균형 해결 기법

Sangmin Oh ;
Juhong Lee

오상민 (인하대학교 전기컴퓨터공학과) ;
이주홍 (인하대학교 전기컴퓨터 공학과)

Received : 2023.06.09
Accepted : 2023.12.26
Published : 2024.02.29

https://doi.org/10.30693/SMJ.2024.13.02.9 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

In this paper, a Diffusion Multi-step Classifier (DMC) is proposed to address the imbalance issue in credit prediction. DMC utilizes a Diffusion Model to generate continuous numerical data from credit prediction data and creates categorical data through a Multi-step Classifier. Compared to other algorithms generating synthetic data, DMC produces data with a distribution more similar to real data. Using DMC, data that closely resemble actual data can be generated, outperforming other algorithms for data generation. When experiments were conducted using the generated data, the probability of predicting delinquencies increased by over 20%, and overall predictive accuracy improved by approximately 4%. These research findings are anticipated to significantly contribute to reducing delinquency rates and increasing profits when applied in actual financial institutions.

본 논문에서는 신용 예측에서 발생하는 불균형 문제를 해결하기 위해 Diffusion Multi-step Classifier(DMC)를 제안한다. DMC는 Diffusion Model을 통해 신용 예측 데이터의 연속적인 수치형 데이터들을 생성하고 생성된 데이터들을 Multi-step Classifier로 구분하는 것으로 범주형 데이터를 생성한다. DMC를 통해 기존의 데이터를 생성하는 다른 알고리즘보다 실제 데이터와 유사한 분포를 가지는 데이터를 생성할 수 있었다. 이렇게 생성된 데이터를 사용하여 실험을 진행하였을 때 연체를 예측할 확률이 20%이상 상승하였으며, 전체적으로 예측 정확성은 약 4%정도 상승하였다. 이러한 연구 결과는 실제 금융기관에 적용 시 연체율 감소와 수익 증가에 큰 기여를 할 수 있을것으로 예상된다.

Keywords

References

N Chen, B Ribeiro, A Chen, "Financial credit risk assessment: a recent review," Artificial Intelligence Review, pp. vol. 45, no. 1, 2016.
X Dastile, TCelik, M Potsane, "Statistical and machine learning models in credit scoring: A systematic literature survey," Applied Soft Computing, vol. 91, 2020.
RD Camino, CA Hammerschmidt, "Oversampling tabular data with deep generative models: Is it worth the effort?," NeurIPS Workshops, PMLR, pp. 148-157, 2020.
NV Chawla, KW Bowyer, LO Hall, "SMOTE: Synthetic Minority Over-sampling Technique," Journal of Artificial Intelligence Research 16, pp. 321-357, 2002. https://doi.org/10.1613/jair.953
H He, Y Bai, EA Garcia, "ADASYN: Adaptive synthetic sampling approach for imbalanced learning," IEEE International Joint Conference on Neural Networks, 2008.
P Dhariwal, A Nichol, "Diffusion models beat gans on image synthesis," Advances in Neural Information Processing System 34, 2021.
I Goodfellow, J Pouget, M Mirza, B Xu, "Genera tive adversarial Networks," Communication of th e ACM, vol. 63, no. 11, pp. 139, 2020.
J Ho, A Jain, P Abbeel, "Denoising diffusion probabilistic models," Advances in Neural Information Processing Systems 33, 2020.
I Brown, C Mues, "An experimental comparison of classification algorithms for imbalanced credit scoring data sets," Expert Systems with Applications, vol. 39, no. 3, pp. 3446-3453, 2012. https://doi.org/10.1016/j.eswa.2011.09.033
이재윤, 이주홍, 최범기, 송재원, "TimeGAN을 활용한 트레이딩 알고리즘 선택," 스마트미디어저널, 제11권, 제1호, 38-45쪽, 2022년 02월
윤재웅, 이주홍, "안전하고 효과적인 자율주행을 위한 불확실성 순차 모델링," 스마트미디어저널, 제11권, 제9호, 9-20쪽, 2022년 10월 https://doi.org/10.30693/SMJ.2022.11.9.9
이윤선, 이주홍, 최범기, 송재원, "비정형, 정형 데이터의 이미지 학습을 활용한 시장예측," 스마트미디어저널, 제10권 제2호, 16-21쪽, 2021년 06월

Smart Media Journal (스마트미디어저널)

Mitigating Data Imbalance in Credit Prediction using the Diffusion Model

Diffusion Model을 활용한 신용 예측 데이터 불균형 해결 기법

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)