DOI QR코드

DOI QR Code

Predictive Analysis of Financial Fraud Detection using Azure and Spark ML

  • Received : 2018.07.05
  • Accepted : 2018.11.06
  • Published : 2018.12.31

Abstract

This paper aims at providing valuable insights on Financial Fraud Detection on a mobile money transactional activity. We have predicted and classified the transaction as normal or fraud with a small sample and massive data set using Azure and Spark ML, which are traditional systems and Big Data respectively. Experimenting with sample dataset in Azure, we found that the Decision Forest model is the most accurate to proceed in terms of the recall value. For the massive data set using Spark ML, it is found that the Random Forest classifier algorithm of the classification model proves to be the best algorithm. It is presented that the Spark cluster gets much faster to build and evaluate models as adding more servers to the cluster with the same accuracy, which proves that the large scale data set can be predictable using Big Data platform. Finally, we reached a recall score with 0.73, which implies a satisfying prediction quality in predicting fraudulent transactions.

Keywords

Acknowledgement

This research work was supported by AWS in Education Grant award.

References

  1. Financial Transactions & Fraud Schemes (n.d). Retrieved from https://www.acfe.com/financialtransactions-and-fraud-schemes.aspx
  2. Hwang, K., and Wiley, John (1997). Computer Arithmetic
  3. Han, J., and Kamber, M. (2006). Data Mining: Concepts and Techniques, Second edition. Morgan Kaufmann Publishers.
  4. Hormozi, H., Akbari, M. K., Hormozi, E., and Javan, M. S. (2013). Credit cards fraud detection by negative selection algorithm on hadoop (To reduce the training time), The 5th Conference on Information and Knowledge Technology, 40-43.
  5. Jones, T. A. (2002). Writing a good paper. IEEE Trans. On General Writing, 1(2), 1-10.
  6. Kamaruddhin, S., and Ravi, V. (2016). Credit Card Fraud Detection using Big Data Analytics: Use of PSOAANN based One-Class Classification. ICIA-16 Proceedings of the International Conference on Informatics and Analytics 2016. Article No. 33.
  7. Lopez-Rojas, E. A., Elmir, A., and Axelsson, S. (2016). PaySim: A financial mobile money simulator for fraud detection. The 28th European Modeling and Simulation Symposium-EMSS.
  8. Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D. B., Amde, M., Owen, S., and Xin, D. (2015). MLlib: Machine learning in apache spark. arXiv preprint arXiv:1505.06807
  9. Sharma, A., and Panigrahi, P. K. (2013). A Review of Financial Accounting Fraud Detection based on Data Mining Techniques. International Journal of Computer Applications .
  10. Synthetic Financial Datasets for Fraud Detection (n.d). Retrieved from https://www.kaggle.com/ntnutestimon/paysim1
  11. Woo , J., and Xu , Y. (2011). Market Basket Analysis Algorithm with Map/Reduce of Cloud Computing. The 2011 international Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2011), Las Vegas (July 18-21, 2011).
  12. Woo, J. (2013). Market Basket Analysis Algorithms with MapReduce. Wiley Interdisciplinary Reviews Data Mining and Knowledge Discovery, 3(6), 445-452. https://doi.org/10.1002/widm.1107