Browse > Article

Learning Behavior Analysis of Bayesian Algorithm Under Class Imbalance Problems  

Hwang, Doo-Sung (Department of computer science, Dankook University)
Publication Information
Abstract
In this paper we analyse the effects of Bayesian algorithm in teaming class imbalance problems and compare the performance evaluation methods. The teaming performance of the Bayesian algorithm is evaluated over the class imbalance problems generated by priori data distribution, imbalance data rate and discrimination complexity. The experimental results are calculated by the AUC(Area Under the Curve) values of both ROC(Receiver Operator Characteristic) and PR(Precision-Recall) evaluation measures and compared according to imbalance data rate and discrimination complexity. In comparison and analysis, the Bayesian algorithm suffers from the imbalance rate, as the same result in the reported researches, and the data overlapping caused by discrimination complexity is the another factor that hampers the learning performance. As the discrimination complexity and class imbalance rate of the problems increase, the learning performance of the AUC of a PR measure is much more variant than that of the AUC of a ROC measure. But the performances of both measures are similar with the low discrimination complexity and class imbalance rate of the problems. The experimental results show 4hat the AUC of a PR measure is more proper in evaluating the learning of class imbalance problem and furthermore gets the benefit in designing the optimal learning model considering a misclassification cost.
Keywords
Class imbalance problem; Data overlapping; Bayesian algorithm; Performance evaluation;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Japkowicz N. and Stephen S., "The Class Imbalance Problem: A Systematic Study," Intelligent Data Analysis, Vol. 6, no. 5, pp. 429-450, November 2002
2 Ronaldo C. Prati, Gustavo E. A. P. A. Batista and Maria Carolina Monard, "Class Imbalances versus Class Overlapping: An Analysis of a Learning System Behavior," MICAI, pp, 312-321, 2004
3 Maciej A. Mazurowski, Piotr A. Habas, Jacek M. Zurada, Joseph Y. Lo, Jay A. Baker and Georgia D. Tourassi, "Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance," Neural Networks, Vol. 21, no. 2-3, pp.427-436, 2008   DOI   ScienceOn
4 Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006
5 Gary M. Weiss and Foster J. Provost, "Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction." J. Artif. Intell. Res.(JAIR), Vol. 19, pp. 315-354, 2003   DOI
6 Visa, S. and Ralescue, A., "The effect of imbalanced data class distribution on fuzzy classifiers-experimental study," Proceedings of the FUZZ-IEEE Conference, 2005
7 C. Ferri, P. Flach and J. Hernndez-Orallo, "Learning Decision Trees Using the Area Under ROC Curve," Proceedings of the 19th International Conference on Machine Learning(ICML-2002), pp. 139-146, 2002
8 Gustavo E. A. P. A. Batista, Ronaldo C. Prati and Maria Carolina Monard, "A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data," SIGKDD Explorations, Vol. 6, 2004
9 Jie Gu, Yuanbing Zhou and Xianqiang Zuo, "Making Class Bias Useful: A Strategy of Learning from Imbalanced Data," Intelligent Data Engineering and Automated Learninghttp://kamje.kisti.re.kr/new/confirm/confirm_0601.jsp?art_seq=DHJJMM_2008_v45n6_179&vn=%EC%A0%9C45%EA%B6%8C6%ED%98%B8&art_pg=179-186&ref_cnt=16(IDEAL), pp.287-295, 2007
10 Dimitriadou E, Hornik K, Leisch F, Meyer D and Weingessel A, "e1071: Misc Functions of the Department of Statistics(e1071)", Version 1.5-11, TU Wien, 2007
11 Jesse Davis and Mark Goadrich, "The relationship between Precision-Recall and ROC curves," Proceedings of the 23th International Conference on Machine Learning(ICML-2006), pp. 233-240, 2006
12 Jin Huang and Charles X. Ling, "Using AUC and Accuracy in Evaluating Learning Algorithms," IEEE Trans. Knowl. Data Eng., Vol. 17, no. 3, pp. 299-310, 2005   DOI   ScienceOn
13 Tie-Yan Liu, Yiming Yang, Hao Wan, Hua-Jun Zeng, Zheng Chen and Wei-Ying Ma, "Support Vector Machines Classification with A Very Large-scale Taxonomy," SIGKDD Explorations, Vol. 7, no. 1, 2005
14 Yuchun Tang, Sven Krasser, Paul Judge and Yan-Qing Zhang, "Fast and Effective Spam Sender Detection with Granular SVM on Highly Imbalanced Mail Server Behavior Data," Collaborative Computing: Networking, Applications and Worksharing, pp.1-6, 2006
15 Ian H. Witten and Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques, 2nd edition, Elsevier, 2005
16 Vicente Garca and Ram-n Alberto Mollineda, "An Empirical Study of the Behavior of Classifiers on Imbalanced and Overlapped Data Sets," CIARP, pp. 397-406, 2007