[KSCI] Korea Science Citation Index Service

Learning Behavior Analysis of Bayesian Algorithm Under Class Imbalance Problems

Hwang, Doo-Sung (Department of computer science, Dankook University)

Publication Information

Journal of the Institute of Electronics Engineers of Korea CI / v.45, no.6, 2008 , pp. 179-186 More about this Journal

Abstract

In this paper we analyse the effects of Bayesian algorithm in teaming class imbalance problems and compare the performance evaluation methods. The teaming performance of the Bayesian algorithm is evaluated over the class imbalance problems generated by priori data distribution, imbalance data rate and discrimination complexity. The experimental results are calculated by the AUC(Area Under the Curve) values of both ROC(Receiver Operator Characteristic) and PR(Precision-Recall) evaluation measures and compared according to imbalance data rate and discrimination complexity. In comparison and analysis, the Bayesian algorithm suffers from the imbalance rate, as the same result in the reported researches, and the data overlapping caused by discrimination complexity is the another factor that hampers the learning performance. As the discrimination complexity and class imbalance rate of the problems increase, the learning performance of the AUC of a PR measure is much more variant than that of the AUC of a ROC measure. But the performances of both measures are similar with the low discrimination complexity and class imbalance rate of the problems. The experimental results show 4hat the AUC of a PR measure is more proper in evaluating the learning of class imbalance problem and furthermore gets the benefit in designing the optimal learning model considering a misclassification cost.

Keywords

Class imbalance problem; Data overlapping; Bayesian algorithm; Performance evaluation;

Citations & Related Records

Reference

1	Japkowicz N. and Stephen S., "The Class Imbalance Problem: A Systematic Study," Intelligent Data Analysis, Vol. 6, no. 5, pp. 429-450, November 2002
2	Ronaldo C. Prati, Gustavo E. A. P. A. Batista and Maria Carolina Monard, "Class Imbalances versus Class Overlapping: An Analysis of a Learning System Behavior," MICAI, pp, 312-321, 2004
3	Maciej A. Mazurowski, Piotr A. Habas, Jacek M. Zurada, Joseph Y. Lo, Jay A. Baker and Georgia D. Tourassi, "Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance," Neural Networks, Vol. 21, no. 2-3, pp.427-436, 2008 DOI ScienceOn
4	Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006
5	Gary M. Weiss and Foster J. Provost, "Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction." J. Artif. Intell. Res.(JAIR), Vol. 19, pp. 315-354, 2003 DOI
6	Visa, S. and Ralescue, A., "The effect of imbalanced data class distribution on fuzzy classifiers-experimental study," Proceedings of the FUZZ-IEEE Conference, 2005
7	Dimitriadou E, Hornik K, Leisch F, Meyer D and Weingessel A, "e1071: Misc Functions of the Department of Statistics(e1071)", Version 1.5-11, TU Wien, 2007
8	C. Ferri, P. Flach and J. Hernndez-Orallo, "Learning Decision Trees Using the Area Under ROC Curve," Proceedings of the 19th International Conference on Machine Learning(ICML-2002), pp. 139-146, 2002
9	Gustavo E. A. P. A. Batista, Ronaldo C. Prati and Maria Carolina Monard, "A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data," SIGKDD Explorations, Vol. 6, 2004
10	Jie Gu, Yuanbing Zhou and Xianqiang Zuo, "Making Class Bias Useful: A Strategy of Learning from Imbalanced Data," Intelligent Data Engineering and Automated Learninghttp://kamje.kisti.re.kr/new/confirm/confirm_0601.jsp?art_seq=DHJJMM_2008_v45n6_179&vn=%EC%A0%9C45%EA%B6%8C6%ED%98%B8&art_pg=179-186&ref_cnt=16(IDEAL), pp.287-295, 2007
11	Jesse Davis and Mark Goadrich, "The relationship between Precision-Recall and ROC curves," Proceedings of the 23th International Conference on Machine Learning(ICML-2006), pp. 233-240, 2006
12	Jin Huang and Charles X. Ling, "Using AUC and Accuracy in Evaluating Learning Algorithms," IEEE Trans. Knowl. Data Eng., Vol. 17, no. 3, pp. 299-310, 2005 DOI ScienceOn
13	Vicente Garca and Ram-n Alberto Mollineda, "An Empirical Study of the Behavior of Classifiers on Imbalanced and Overlapped Data Sets," CIARP, pp. 397-406, 2007
14	Tie-Yan Liu, Yiming Yang, Hao Wan, Hua-Jun Zeng, Zheng Chen and Wei-Ying Ma, "Support Vector Machines Classification with A Very Large-scale Taxonomy," SIGKDD Explorations, Vol. 7, no. 1, 2005
15	Yuchun Tang, Sven Krasser, Paul Judge and Yan-Qing Zhang, "Fast and Effective Spam Sender Detection with Granular SVM on Highly Imbalanced Mail Server Behavior Data," Collaborative Computing: Networking, Applications and Worksharing, pp.1-6, 2006
16	Ian H. Witten and Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques, 2nd edition, Elsevier, 2005

KSCI

Learning Behavior Analysis of Bayesian Algorithm Under Class Imbalance Problems 클래스 불균형 문제에서 베이지안 알고리즘의 학습 행위 분석

Learning Behavior Analysis of Bayesian Algorithm Under Class Imbalance Problems