Browse > Article

Illegal Cash Accommodation Detection Modeling Using Ensemble Size Reduction  

Lee, Hwa-Kyung (Daum Communication)
Han, Sang-Bum (Dept. of IE, Hong Ik University)
Jhee, Won-Chul (Dept. of IE, Hong Ik University)
Publication Information
Journal of Intelligence and Information Systems / v.16, no.1, 2010 , pp. 93-116 More about this Journal
Abstract
Ensemble approach is applied to the detection modeling of illegal cash accommodation (ICA) that is the well-known type of fraudulent usages of credit cards in far east nations and has not been addressed in the academic literatures. The performance of fraud detection model (FDM) suffers from the imbalanced data problem, which can be remedied to some extent using an ensemble of many classifiers. It is generally accepted that ensembles of classifiers produce better accuracy than a single classifier provided there is diversity in the ensemble. Furthermore, recent researches reveal that it may be better to ensemble some selected classifiers instead of all of the classifiers at hand. For the effective detection of ICA, we adopt ensemble size reduction technique that prunes the ensemble of all classifiers using accuracy and diversity measures. The diversity in ensemble manifests itself as disagreement or ambiguity among members. Data imbalance intrinsic to FDM affects our approach for ICA detection in two ways. First, we suggest the training procedure with over-sampling methods to obtain diverse training data sets. Second, we use some variants of accuracy and diversity measures that focus on fraud class. We also dynamically calculate the diversity measure-Forward Addition and Backward Elimination. In our experiments, Neural Networks, Decision Trees and Logit Regressions are the base models as the ensemble members and the performance of homogeneous ensembles are compared with that of heterogeneous ensembles. The experimental results show that the reduced size ensemble is as accurate on average over the data-sets tested as the non-pruned version, which provides benefits in terms of its application efficiency and reduced complexity of the ensemble.
Keywords
Illegal Cash Accommodation; Fraud Detection System; Diversity Measure; Ensemble Size Reduction; Data Mining;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Dietterich, T., "An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees : Bagging, Boosting and Randomization", Machine Learning, Vol.40, No.2(2000), 139-157.   DOI   ScienceOn
2 Estabrooks, A., T. Jo and N. Japkowicz, "A Multiple Resampling Method for Learning from Imbalances Data Sets", Computational Intelligence, Vol.20, No.1(2004), 18-36.   DOI   ScienceOn
3 Kubat, M., R. Holte and S. Matwin, "Machine Learning for the Detection of Oil Spills in Satellite Radar Images", Machine Learning, Vol.30(1998), 195-215   DOI   ScienceOn
4 Radivojac, P., V. N. V. Chawla, K. A. Dunker and Z. Obradovic, "Classification and Knowledge Discovery in Protein Databases", Journal of Biomedical Informatics, Vol.37(2004), 224-239.   DOI   ScienceOn
5 김정동, 박종수, "자동차보험 사기 적발 모형에 관한 연구", (2006).
6 금융감독원 보도자료, 신용카드사 경영실적, (2002-2006).
7 안철경, 조혜원, 김경환, 국내외 보험사기관리 실태 분석 : 선진사례 및 설문분석을 중심으로, 보험개발원, (2002).
8 강필성, 이형주, 조성준, "데이터 불균형 문제에서의 SVM 앙상블 기법의 적용", 한국정보과학회 추계학술대회논문집, 31권 2호(2005), 706-708.
9 Bruzzone, L. and S. B. Serpico, "Classification of Imbalanced Remote-sensing Data by Neural Networks", Pattern Recognition Letters, Vol.18(1997), 1323-1328.   DOI   ScienceOn
10 Chan, P. K., W. Fan, A. L. Prodromidis and S. J. Stolfo, "Distributed Data Mining in Credit Card Fraud Detection", IEEE Intelligent Systems, Vol.14, No.6(1999), 67-74.   DOI   ScienceOn
11 Chawla, N. V., K. W. Boywer, L. O. Hall and W. P. Kegelmeyer, "SMOTE : Synthetic Minority Over-sampling Technique", Journal of Artificial Intelligence Research, Vol.16(2002), 321-357.
12 Chawla, N. V., N. Japkowicz and A. Kolcz, "Editorial : Special Issue on Learning from Imbalanced Data Sets", SIGKDD Exploration, Vol.6(2004), 1-6.   DOI
13 Chen, R. C., S. T. Luo, X. Liang and V. C. S. Lee, "Personalized approach based on SVM and ANN for detecting credit card fraud", Proceedings of the IEEE International Conference on Neural Networks and Brain, October(2005), 810-815.
14 Chiu, C. and Chieh-Yuan Tsai, "A Web Services-Based Collaborative Scheme for Credit Card Fraud Detection", Proceedings of the 2004 IEEE International Conference on e-Technology, e-Commerce and e-Service, March Vol.28, No.31(2004), 177-181.
15 Japkowicz N. and S. Stephen, "The Class Imbalance Problem : A Systematic Study", Intelligent Data Analysis, Vol.6, No.5(2002) 429-450.
16 Krogh, A. and J. Vedelsby, "Neural Networks Ensembles, Cross Validation, and Active Learning", Advances in Neural Information Processing Systems, (1995), 231-238.
17 Kuncheva, L. and C. J. Whitaker, "Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy", IEEE Transactions on System, Man and Cybernetics, Vol.32, No.2(2002), 146-156.   DOI   ScienceOn
18 Lee, W., S. Stolfo, and K. Mok. "A Data Mining Framework for Building Intrusion Detection Models," Proceedings of the IEEE Symposium on Security and Privacy, Oakland, CA, May(1999).
19 Opitz, D., "Feature Selection for Ensembles", Proc. of the 16th National Conf. on Artificial Intelligence, AAAI, (1999), 379-384.
20 Opitz, D. and J. Shavlik, "Actively Searching for an Effective Neural Network Ensembles", Connection Science, Vol.8, No3(1996), 337-353.   DOI   ScienceOn
21 유상진, 박문로, "데이터마이닝 기법을 활용한 의료보험 진료비 청구 삭감분석 시스템 개발 및 구현에 관한 연구", Information System Review, Vol.7(2005), 275-295.   과학기술학회마을
22 조성목, "신용카드불법거래 유형 및 대응방안", 신용카드 30호(2004).
23 허준, 김종우, "불균형 데이터 집합에서의 의사결정나무 추론", Information System Review, Vol.9(2007), 45-65.   과학기술학회마을
24 Breiman, L., "Bagging Predictors", Machine Learning, Vol.24(1996), 123-140.
25 Breiman, L., "Arcing Classifiers", Annals of Statistics, Vol.26(1998), 801-849.   DOI
26 Batista, G., Pati, R. C. and Monard, M. C., "A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data", SIGKDD Exploration, Vol.6(2004), 20-29.   DOI
27 Brause, R., T. Langsdorf, and M. Hepp, "Neural Data Mining for Credit Card Fraud Detection", Proceeding of the 11th IEEE International Conference on Tools with Artificial Intelligence, November Vol.8, No.10(1999) 103-105.
28 Weisberg, H. I. and R. A. Derrig, "Fraud and Automobile Insurance : A Report on the Baseline Study of Bodily Injury Claims in Massachusetts," Journal of Insurance Regulation, Vol.9(1991), 427-541.
29 Wolpert, D., "Stacked Generalization", Neural Networks, Vol.5(1992), 241-259.   DOI   ScienceOn
30 Yan, R., Y. Liu, R. Jin and A. Hauptman, "On Predicting Rare Classes with SVM Ensembles in Scene Classification", IEEE International Conference on Acoustics, Speech and Signal Processing, (2003).
31 Zhao, Y., J. Gao, and X. Yang, "A Survey of Neural Network Ensembles", International Conference on Neural Networks and Brain, (2005), 438-442.
32 Zhou, Z. H., J. Wu, and W. Tang, "Ensembling Neural Networks : Many could be better than all", Artificial Intelligence, Vol.137, No.1(2002), 239-263.   DOI   ScienceOn
33 Fawcett, T. and F. Provost, "Combining Data Mining and Machine Learning for Effective User Profile", Proc. of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, AAAI(1996), 8-13.
34 Fawcett, T. and F. Provost, "Adaptive Fraud Detection", Data Mining and Knowledge Discovery, Vol.1(1997), 291-316.   DOI   ScienceOn
35 Freund, Y and R. Shapiro, "A Decision-theoretic Generalization of On-line Learning and an Applicationto Boosting", Journal of Computer and System Sciences, Vol.55(1997), 119-139.   DOI   ScienceOn
36 Guo, H. and H. L. Viktor, "Learning from Imbalanced data Sets with Boosting and Data Generation : The DataBoos-IM Approach", SIGKDD Exploring, Vol.6(2004), 30-39.   DOI
37 Hansen, L. and P. Salomon, "Neural Network Ensembles", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.12,(1990), 993-1001.   DOI   ScienceOn
38 Hernandez, C., M. Fernandez, and M. Oritiz, "New Experimental Ensembles of Multilayer Feedforward for Classification Problem", Int'l Joint Conf. on Neural Networks, (2005).
39 Quilan, J. R., "Bagging, Boosting, and C4.5", Proc. of the 13th National Conf. on Artificial Intelligence, (1996), 725-730.
40 Panigrahi, S., A. Kundu, S. Sural and A. K. Majumdar, "Credit card fraud detection : A fusion approach using Dempster-Shafer theory and Bayesian learning", Information Fusion, Vol.10(2009), 354-363.   DOI   ScienceOn
41 Rooney, N., D. Patterson and C. Nugent, "Reduced Ensemble Size Stacking", Proc. of the 16th IEEE International Conference on Tools with Artificial Intelligence(ICTAI), (2004), 266-271.
42 Rooney, N., D. Patterson and C. Nugent, "Pruning Extension to Stacking", Intelligent Data Analysis, Vol.10(2006), 47-66.
43 Stijn, V., R. A. Derrig and G. Dedene, "A Case Study of Applying Boosting Naive Bayes to Claim Fraud Diagnosis", IEEE Transactions on Knowledge and Data Engineering, (2004), 612-620.   DOI   ScienceOn
44 Stolfo, S. J., W. Fan, W. Lee, A. Prodromidis and P. K. Chan, "JAM : Java Agents for Meta-Learning over Distributed Databases", Proc. of 3rd Int'l Conference on Knowledge Discovery and Data Mining, AAAI Press, Menlo Park, CA., (1997), 74-81.