[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.4218/etrij.2019-0205

Adaptive boosting in ensembles for outlier detection: Base learner selection and fusion via local domain competence

Bii, Joash Kiprotich (Department of Computing, School of Computing and Information Technology, Jomo Kenyatta University of Agriculture and Technology)
Rimiru, Richard (Department of Computing, School of Computing and Information Technology, Jomo Kenyatta University of Agriculture and Technology)
Mwangi, Ronald Waweru (Department of Computing, School of Computing and Information Technology, Jomo Kenyatta University of Agriculture and Technology)

Publication Information

ETRI Journal / v.42, no.6, 2020 , pp. 886-898 More about this Journal

Abstract

Unusual data patterns or outliers can be generated because of human errors, incorrect measurements, or malicious activities. Detecting outliers is a difficult task that requires complex ensembles. An ideal outlier detection ensemble should consider the strengths of individual base detectors while carefully combining their outputs to create a strong overall ensemble and achieve unbiased accuracy with minimal variance. Selecting and combining the outputs of dissimilar base learners is a challenging task. This paper proposes a model that utilizes heterogeneous base learners. It adaptively boosts the outcomes of preceding learners in the first phase by assigning weights and identifying high-performing learners based on their local domains, and then carefully fuses their outcomes in the second phase to improve overall accuracy. Experimental results from 10 benchmark datasets are used to train and test the proposed model. To investigate its accuracy in terms of separating outliers from inliers, the proposed model is tested and evaluated using accuracy metrics. The analyzed data are presented as crosstabs and percentages, followed by a descriptive method for synthesis and interpretation.

Keywords

adaptive boosting; base learners; heterogeneous ensembles; outlier detection;

Citations & Related Records

Reference

1	Z. Zhi-Hua, Ensemble Learning, National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China, 2012.
2	T. K. Ho, J. J. Hull, and S. N. Srihari, Decision combination in multiple classifier systems, IEEE Trans. Pattern Analysis Machine Intell. 16 (1994), no. 1, 66-75. DOI
3	R. Shebuti and A. Leman, An ensemble approach for event detection in dynamic graphs, in KDD ODD2 Workshop (New York, USA) 2014.
4	D. Khullar, A. K. Jha, and A. B. Jena, Reducing diagnostic errors-why now, New England. J. Med 373 (2015), 2491-2493. DOI
5	N. Isadora et al., Ensemble learning method for outlier detection and its application to astronomical light curves, The Astronomical J. 152 (2016), no. 3, 71:1-13.
6	D. Hawkins, Identification of Outliers, Chapman and Hall, London, 1980.
7	M. Milou, Outlier detection in datasets with mixed-attributes, Vrije Universiteit Amsterdam (Sept. 2015), Thesis [online: https://beta. vu.nl/nl/Images/stageverslag-meltzer_tcm235-614959.pdf, last accessed March 31, 2019].
8	E. Schubert, A. Zimek, and H. P. Kriegel, Local outlier detection reconsidered: A generalized view on locality with applications to spatial, video, and network outlier detection, Data Mining Knowledge Discovery 28 (2014), no. 1, 190-237. DOI
9	B. Van Stein, M. Van Leeuwen, and T. Back, Local subspace-based outlier detection using global neighborhoods (Gloss), in Proc. IEEE Int. Conf. Big Data (Washington, DC, USA) Dec. 2016, pp. 1136-1142.
10	H.-P. Kriegel et al., LoOP: Local outlier probabilities, in Proc. ACM Conf. Inf. Knowledge Mananag. (Hong Kong, China) Nov. 2009, pp. 1649-1652.
11	M. Breuning et al., LOF: Identifying density based local outliers, in Proc. ACM SIGMOD Int. Conf. Manag. Data (Dallas, TX, USA), 2000, pp. 93-104.
12	G. Giacinto and F. Roli, A theoretical framework for dynamic classifier selection, in Proc. Int. conf. Pattern, Recogn. (Barcelona, Spain), Sept. 2000, pp. 8-11.
13	A. S. Britto, R. Sabourin, and L. E. S. Oliveira, Dynamic selection of classifiers - a comprehensive review, Pattern Recogn. 47 (2014), no. 11, 3665-3680. DOI
14	R. Polikar, Ensemble based systems in decision making, IEEE Circuits Syst. Mag. 6 (2006), no. 3, 21-45. DOI
15	K. Woods, W. P. Kegelmeyer, and K. Bowyer, Combination of multiple classifiers using local accuracy estimates, IEEE Trans. Pattern Analysis Machine Intell. 19 (1997), no. 4, 405-410. DOI
16	A. H. R. Ko, R. Sabourin, and A. S. Britto, From dynamic classifier selection to dynamic ensemble selection, Pattern Recogn. 41 (2008), no. 5, 1735-1748.
17	R. M. O. Cruz, R. Sabourin, and G. D. Cavalcanti, Dynamic classifier selection: recent advances and perspectives, Inf. Fusion 41 (2018), 195-216. DOI
18	A. Lazarevic and V. Kumar, Feature bagging for outlier detection, in Proc. ACM SIGKDD Int. Conf. Knowledge Discovery data Mining (Chicago, IL, USA), Aug. 2005, pp. 157-166.
19	H. V. Nguyen, H. H. Ang, and V. Gopalkrishnan, Mining outliers with ensemble of heterogeneous detectors on random subspaces, in Proc. Int. Conf. Database Syst. Adv. Appicat. (Tsukuba, Japan), 2010, pp. 368-383.
20	A. Zimek, R. J. G. B. Campello, and J. Sander, Ensembles for unsupervised outlier detection: Challenges and research questions, ACM SIGKDD Explorations 15 (2014), no. 1, 11-22. DOI
21	Y. Zhao and M. K. Hryniewicki, DCSO: Dynamic combination of detector scores for outlier, Ensembles (2018), https://doi.org/10.13140/RG.2.2.11165.77288. DOI
22	B. Micenkova, B. McWilliams, and I. Assent, Learning representations for outlier detection on a budget (BORE), arXiv Preprint: 1507.08104, 2015.
23	E. Schubert et al., On evaluation of outlier rankings and outlier scores, in Proc. SIAM Int. Conf. Data Mining (Anaheim, CA, USA), 2012, pp. 1047-1058.
24	Y. Freund and R. E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, J. Comput. Syst. Sci. 55 (1997), no. 1, 119-139. DOI
25	D. H. Wolpert, Stacked generalization, Neural Netw. 5 (1992), no. 2, 241-259. DOI
26	C. C. Aggarwal and S. Sathe, Outlier ensembles: An introduction, Springer, New York, NY, USA, 2017.
27	A. Klementiev, D. Roth, and K. Small, An unsupervised learning algorithm for rank aggregation, in Proc. Eur. Conf. Machine Learn. (Warsaw, Poland), 2007, pp. 616-623.
28	ODDS Library, 2016, [Available from: http://odds.cs.stonybrook.edu. last accessed December 2019].
29	E. M. Knorr and R. T. Ng, Algorithms for mining distance-based outliers in large dataset, in Proc. Int. Conf. Very Large Data Bases (New York, NY, USA), 1998, pp. 392-403.
30	J. Zhang, Towards outlier detection for high-dimensional data streams using projected outlier analysis strategy, Dissertation, Dalhousie University, Halifax, Canada, 2008.
31	H. P. Kriegel et al., Outlier detection in axis-parallel subspaces of high dimensional data, in Proc. Pacific-Asia Conf. Knowledge Discovery Data Mining (Bangkok, Thailand), 2009, pp. 831-838.
32	A. Emmott et al., A meta-analysis of the anomaly detection problem, arXiv preprint, arXiv:1503.01158, 2015.
33	J. Demsar, Statistical comparisons of classifiers over multiple data sets, J. Machine Learn. Research 7 (2006), 1-30.
34	L. Breiman, Bagging predictors, Machine Learn. 24 (1996), no. 2, 123-140. DOI
35	V. Chandola, A. Banerjee, and V. Kumar, Anomaly detection: a survey, ACM Comput. Surveys 41 (2009), no. 3, 15:1-58.
36	E. Burnaev, P. Erofeev, and D. Smolyakov, Model selection for anomaly detection, in Proc. Int. Conf. Machine Vision (Barcelona, Spain), Oct. 2015, pp. 987525:1-6.
37	S. Ghosh and D. L. Reilly, Credit card fraud detection with a neural-network, in Proc. 27th Hawaii Int. Conf. Syst. Sci. (Wailea, HI, USA), Jan. 1994, pp. 621-630.
38	Y. Wang and R. Rekaya, LSOSS: Detection of cancer outlier differential gene expression, Biomarker Insights 5 (2010), 69-78.
39	C. C. Aggarwal, Outlier ensembles: position paper, ACM SIGKDD Explorations 14 (2013), no. 2, 49-58. DOI
40	S. Das et al., Incorporating expert feedback into active anomaly discovery, in Proc. IEEE Int. Conf. Data Mining (Barcelona, Spain), Dec. 2016, pp. 853-858.
41	C. C. Aggarwal and S. Sathe, Theoretical foundations and algorithms for outlier ensembles, ACM SIGKDD Explorations Newsletter 17 (2015), no. 1, 24-47. DOI
42	S. Rayana, W. Zhong, and L. Akoglu, Sequential ensemble learning for outlier detection: A bias-variance perspective, in Proc. IEEE Int. Conf. Data Mining (Barcelona, Spain), Dec 2016, pp. 1167-1172.
43	S. Rayana and L. Akoglu, Less is more: Building selective anomaly ensembles, Trans. Knowledge Discovery Data 10 (2016), no. 4, 1-33.
44	Y. Zhao and M. K. Hryniewicki, XGBOD: Improving supervised outlier detection with unsupervised representation learning, in Proc. Int. Joint Conf. Neural Netw. (Rio de Janeiro, Brazil), July 2018, pp. 1-8.
45	M. Xie et al., Anomaly detection in wireless sensor networks: a survey, J. Netw. Comput. Applicat. 34 (2011), no. 4, 1302-1325. DOI
46	B. Wang and Z. Mao, Outlier detection based on a dynamic ensemble model: Applied to process monitoring, Inf. Fusion 51 (2019), 244-258. DOI
47	M. N. Haque et al., Heterogeneous ensemble combination search using genetic algorithm for class imbalanced data, Classification, PLoS ONE 11 (2016), no. 1, e0146116:1-e146128.