DOI QR코드

DOI QR Code

A Study on the Fraud Detection for Electronic Prepayment using Machine Learning

머신러닝을 이용한 선불전자지급수단의 이상금융거래 탐지 연구

  • Choi, Byung-Ho (Department of Industrial & Information Systems, Graduate School of Public Policy and Information Technology, Seoul National University of Science & Technology) ;
  • Cho, Nam-Wook (Department of Industrial & Information Systems Engineering, Seoul National University of Science & Technology)
  • Received : 2022.02.24
  • Accepted : 2022.05.06
  • Published : 2022.05.31

Abstract

Due to the recent development in electronic financial services, transactions of electronic prepayment are rapidly growing, leading to growing fraud attempts. This paper proposes a methodology that can effectively detect fraud transactions in electronic prepayment by machine learning algorithms, including support vector machines, decision trees, and artificial neural networks. Actual transaction data of electronic prepayment services were collected and preprocessed to extract the most relevant variables from raw data. Two different approaches were explored in the paper. One is a transaction-based approach, and the other is a user ID-based approach. For the transaction-based approach, the first model is primarily based on raw data features, while the second model uses extra features in addition to the first model. The user ID-based approach also used feature engineering to extract and transform the most relevant features. Overall, the user ID-based approach showed a better performance than the transaction-based approach, where the artificial neural networks showed the best performance. The proposed method could be used to reduce the damage caused by financial accidents by detecting and blocking fraud attempts.

전자금융서비스가 활성화됨에 따라 전자금융 거래 건수와 거래액은 매년 증가하고 있으며, 선불전자지급 과정에서의 사이버 금융범죄도 증가하고 있다. 본 논문에서는 머신러닝 알고리즘을 이용한 선불전자지급수단의 이상금융거래 탐지모델을 제시한다. 이를 위하여 실제 선불전자거래 데이터를 익명화하여 수집하였으며, 데이터의 효과적인 특성을 추출하기 위한 전처리 작업을 수행하였다. 제안된 모델은 거래내역 기반과 이용자 ID 기반 접근법을 이용하였다. 거래내역 기반 모델 분석에서는 원데이터 기반 거래내역 분석과 특성 항목을 추가한 2차 분석을 수행하였으며, 이용자 ID 기반 모델에서도 도메인 특성에 맞는 특성 항목을 추출하여 분석에 활용하였다. 이상치 탐지를 위해 의사결정나무, 인공신경망 및 서포트 벡터 머신 알고리즘을 활용하여 비교 분석하였다. 분석결과 거래내역 기반의 탐지모델보다 이용자 ID 기반의 탐지모델이 선불거래지급수단 이상탐지에 더 효과적임을 확인할 수 있었으며, 이용자 ID 기반 모델에서는 신경망 알고리즘이 가장 좋은 성능을 나타내었다. 제안된 방법론은 향후 이상금융거래 탐지시스템 분석에 활용함으로써 전자금융사고 피해를 줄이는데 기여할 수 있을 것으로 기대된다.

Keywords

Acknowledgement

이 논문은 2022년도 정부(산업통상자원부)의 재원으로 한국산업기술진흥원의 지원을 받아 수행된 연구임 (P0017123, 2022년 산업혁신인재성장지원사업).

References

  1. Bank of Korea, "Economic Statistics System - Search in Statistical Classification," https://ecos.bok.or.kr/flex/EasySearch_e.jsp, 2022.01.29.
  2. Breiman, L., Stone, C. J., Feieman, J. H., and Olshen, L. A., "Classification and regression trees," Chapman & Hall/CRC, London, 1984.
  3. Choi, B. H. and Cho, N. W., "A study on the fraud detection through sequential pattern analysis," The Journal of Society for e-Business Studies, Vol. 26, No. 3, pp. 21-32, 2021. https://doi.org/10.7838/JSEBS.2021.26.3.021
  4. Chung, Y. M. and Lim, H. Y., "An experimental study on text categorization using an SVM classifier," Journal of the Korean Society for information Management, Vol. 17, No. 4, pp. 229-248, 2001.
  5. Cortes, C. and Vapnik, V., "Support-vector networks," Machine Learning, Vol. 20, No. 3, pp. 273-297, 1995. https://doi.org/10.1007/BF00994018
  6. Han, H. C., Kim, H. N., and Kim, H. K., "Fraud detection system in mobile payment service using data mining," The Journal of Korea Institute of Information Security & Cryptology, Vol. 26, No. 6, pp. 1527-1537, 2016. https://doi.org/10.13089/JKIISC.2016.26.6.1527
  7. Hwang, S. W., "A study on distinction of deterioration of high speed railway track using an SVM," Kangwon National University, 2013.
  8. Jan, S. U., Lee, Y. D., Shin, J. P., and Koo, I. S., "Sensor fault classification based on support vector machine and statistical time-domain features," IEEE Access, Vol. 5, pp. 8682-8690, 2017. https://doi.org/10.1109/ACCESS.2017.2705644
  9. Jun, C. H., "Data Mining Techniques," Hannarae Publishing Co, Seoul, 2012.
  10. Korean National Police Agency, Cyber Investigation - Status for cyber crime arrest, https://www.police.go.kr/eng/statistics/statisticsSm/statistics04.jsp, 2022.01.29.
  11. Lantz, B., "machine learning with R - Second Edition," (Yoon, S. J., Trans), acornpub.co, Seoul, 2017 (Original work published 2015).
  12. Lee, G. H., Shin, B. C., and Hur, J. W., "Fault classification of gear pumps using SVM," Journal of Applied Reliability, Vol. 20, No. 2, pp. 187-196, 2020. https://doi.org/10.33162/jar.2020.6.20.2.187
  13. Lee, T. H. and Kook, K. H., "A study on detection of small size malicious code using data mining method," Journal of Information and Security, Vol. 19, No. 1, pp. 11-17, 2019.
  14. McCulloch, W. S. and Pitts, W., "A logical calculus of the ideas immanent in nervous activity," The Bulletin of Mathematical Biophysics, Vol. 5, No. 4, pp. 115-133, 1943. https://doi.org/10.1007/BF02478259
  15. Park, J. H., Kim, H. K., and Kim, E. J., "Effective normalization method for fraud detection using a decision tree," The Journal of Korea Institute of Information Security & Cryptology, Vol. 25, No. 1, pp. 133-146, 2015. https://doi.org/10.13089/JKIISC.2015.25.1.133
  16. Park, K. R., Kim, J. H., and Lee, S. H., "facial Feature Verification System based on SVM Classifier," KIPS Transactions on Software and Data Engineering, Vol. 11, No. 6, pp. 675-682, 2004.
  17. Seo, J. H., "A Study on the Performance Evaluation of Unbalanced Intrusion Detection Dataset Classification based on Machine Learning," Journal of Korean Institute of Intelligent Systems, Vol. 27, No. 5, pp. 466-474, 2017. https://doi.org/10.5391/JKIIS.2017.27.5.466
  18. Seo, M. K., "Practical data processing and analysis using R," Gilbut, Seoul, 2019.
  19. Seoulshinmun, "Customer lost his phone and all his assets were stolen with Kakao Pay, but NaverPay was different," https://www.seoul.co.kr/news/newsView.php?id=20220109500066, 2022.01.09.
  20. Vapnik, V., "An overview of statistical learning theory," IEEE Transactions on Neural Networks, Vol. 10, No. 5, pp. 988-999, 1999. https://doi.org/10.1109/72.788640
  21. Vapnik, V., "The Nature of Statistical Learning Theory," Springer, New York, NY, 1995.
  22. Yang, E. M. and Seo, C. H., "A study on intrusion detection in network intrusion detection system using SVM," The Society of Digital Policy & Management, Vol. 16, No. 5, pp. 399-406, 2018.
  23. Yang, J. W., Lee, Y. D., and Koo, I. S., "Sensor fault detection scheme based on deep learning and support vector machine," The Journal of The Institute of Internet Broadcasting and Communication (IIBC), Vol. 18, No. 2, pp. 185-195, 2018. https://doi.org/10.7236/JIIBC.2018.18.2.185
  24. Yeo, W. K., Seo, Y. M., Lee, S. Y., and Jee, H. K., "Study on water stage prediction using hybrid model of artificial neural network and genetic algorithm," Journal of Korea Water Resources Association, Vol. 43, No. 8, pp. 721-731, 2010. https://doi.org/10.3741/JKWRA.2010.43.8.721
  25. Zhou, Z. H., "Machine Learning," (Kim, K. H., Trans), Jeipub, Paju, Gyeonggi-do, 2020 (Original work published 2016).