DOI QR코드

DOI QR Code

Adaptive boosting in ensembles for outlier detection: Base learner selection and fusion via local domain competence

  • Bii, Joash Kiprotich (Department of Computing, School of Computing and Information Technology, Jomo Kenyatta University of Agriculture and Technology) ;
  • Rimiru, Richard (Department of Computing, School of Computing and Information Technology, Jomo Kenyatta University of Agriculture and Technology) ;
  • Mwangi, Ronald Waweru (Department of Computing, School of Computing and Information Technology, Jomo Kenyatta University of Agriculture and Technology)
  • Received : 2019.04.12
  • Accepted : 2019.12.12
  • Published : 2020.12.14

Abstract

Unusual data patterns or outliers can be generated because of human errors, incorrect measurements, or malicious activities. Detecting outliers is a difficult task that requires complex ensembles. An ideal outlier detection ensemble should consider the strengths of individual base detectors while carefully combining their outputs to create a strong overall ensemble and achieve unbiased accuracy with minimal variance. Selecting and combining the outputs of dissimilar base learners is a challenging task. This paper proposes a model that utilizes heterogeneous base learners. It adaptively boosts the outcomes of preceding learners in the first phase by assigning weights and identifying high-performing learners based on their local domains, and then carefully fuses their outcomes in the second phase to improve overall accuracy. Experimental results from 10 benchmark datasets are used to train and test the proposed model. To investigate its accuracy in terms of separating outliers from inliers, the proposed model is tested and evaluated using accuracy metrics. The analyzed data are presented as crosstabs and percentages, followed by a descriptive method for synthesis and interpretation.

Keywords

References

  1. V. Chandola, A. Banerjee, and V. Kumar, Anomaly detection: a survey, ACM Comput. Surveys 41 (2009), no. 3, 15:1-58.
  2. E. Burnaev, P. Erofeev, and D. Smolyakov, Model selection for anomaly detection, in Proc. Int. Conf. Machine Vision (Barcelona, Spain), Oct. 2015, pp. 987525:1-6.
  3. M. Xie et al., Anomaly detection in wireless sensor networks: a survey, J. Netw. Comput. Applicat. 34 (2011), no. 4, 1302-1325. https://doi.org/10.1016/j.jnca.2011.03.004
  4. S. Ghosh and D. L. Reilly, Credit card fraud detection with a neural-network, in Proc. 27th Hawaii Int. Conf. Syst. Sci. (Wailea, HI, USA), Jan. 1994, pp. 621-630.
  5. Y. Wang and R. Rekaya, LSOSS: Detection of cancer outlier differential gene expression, Biomarker Insights 5 (2010), 69-78.
  6. C. C. Aggarwal, Outlier ensembles: position paper, ACM SIGKDD Explorations 14 (2013), no. 2, 49-58. https://doi.org/10.1145/2481244.2481252
  7. S. Das et al., Incorporating expert feedback into active anomaly discovery, in Proc. IEEE Int. Conf. Data Mining (Barcelona, Spain), Dec. 2016, pp. 853-858.
  8. A. Emmott et al., A meta-analysis of the anomaly detection problem, arXiv preprint, arXiv:1503.01158, 2015.
  9. C. C. Aggarwal and S. Sathe, Theoretical foundations and algorithms for outlier ensembles, ACM SIGKDD Explorations Newsletter 17 (2015), no. 1, 24-47. https://doi.org/10.1145/2830544.2830549
  10. S. Rayana, W. Zhong, and L. Akoglu, Sequential ensemble learning for outlier detection: A bias-variance perspective, in Proc. IEEE Int. Conf. Data Mining (Barcelona, Spain), Dec 2016, pp. 1167-1172.
  11. S. Rayana and L. Akoglu, Less is more: Building selective anomaly ensembles, Trans. Knowledge Discovery Data 10 (2016), no. 4, 1-33.
  12. Y. Zhao and M. K. Hryniewicki, XGBOD: Improving supervised outlier detection with unsupervised representation learning, in Proc. Int. Joint Conf. Neural Netw. (Rio de Janeiro, Brazil), July 2018, pp. 1-8.
  13. B. Wang and Z. Mao, Outlier detection based on a dynamic ensemble model: Applied to process monitoring, Inf. Fusion 51 (2019), 244-258. https://doi.org/10.1016/j.inffus.2019.02.006
  14. M. N. Haque et al., Heterogeneous ensemble combination search using genetic algorithm for class imbalanced data, Classification, PLoS ONE 11 (2016), no. 1, e0146116:1-e146128.
  15. Z. Zhi-Hua, Ensemble Learning, National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China, 2012.
  16. T. K. Ho, J. J. Hull, and S. N. Srihari, Decision combination in multiple classifier systems, IEEE Trans. Pattern Analysis Machine Intell. 16 (1994), no. 1, 66-75. https://doi.org/10.1109/34.273716
  17. R. Shebuti and A. Leman, An ensemble approach for event detection in dynamic graphs, in KDD ODD2 Workshop (New York, USA) 2014.
  18. D. Khullar, A. K. Jha, and A. B. Jena, Reducing diagnostic errors-why now, New England. J. Med 373 (2015), 2491-2493. https://doi.org/10.1056/NEJMp1508044
  19. N. Isadora et al., Ensemble learning method for outlier detection and its application to astronomical light curves, The Astronomical J. 152 (2016), no. 3, 71:1-13.
  20. D. Hawkins, Identification of Outliers, Chapman and Hall, London, 1980.
  21. M. Milou, Outlier detection in datasets with mixed-attributes, Vrije Universiteit Amsterdam (Sept. 2015), Thesis [online: https://beta. vu.nl/nl/Images/stageverslag-meltzer_tcm235-614959.pdf, last accessed March 31, 2019].
  22. E. Schubert, A. Zimek, and H. P. Kriegel, Local outlier detection reconsidered: A generalized view on locality with applications to spatial, video, and network outlier detection, Data Mining Knowledge Discovery 28 (2014), no. 1, 190-237. https://doi.org/10.1007/s10618-012-0300-z
  23. B. Van Stein, M. Van Leeuwen, and T. Back, Local subspace-based outlier detection using global neighborhoods (Gloss), in Proc. IEEE Int. Conf. Big Data (Washington, DC, USA) Dec. 2016, pp. 1136-1142.
  24. H.-P. Kriegel et al., LoOP: Local outlier probabilities, in Proc. ACM Conf. Inf. Knowledge Mananag. (Hong Kong, China) Nov. 2009, pp. 1649-1652.
  25. M. Breuning et al., LOF: Identifying density based local outliers, in Proc. ACM SIGMOD Int. Conf. Manag. Data (Dallas, TX, USA), 2000, pp. 93-104.
  26. Y. Zhao and M. K. Hryniewicki, DCSO: Dynamic combination of detector scores for outlier, Ensembles (2018), https://doi.org/10.13140/RG.2.2.11165.77288.
  27. A. S. Britto, R. Sabourin, and L. E. S. Oliveira, Dynamic selection of classifiers - a comprehensive review, Pattern Recogn. 47 (2014), no. 11, 3665-3680. https://doi.org/10.1016/j.patcog.2014.05.003
  28. R. Polikar, Ensemble based systems in decision making, IEEE Circuits Syst. Mag. 6 (2006), no. 3, 21-45. https://doi.org/10.1109/MCAS.2006.1688199
  29. K. Woods, W. P. Kegelmeyer, and K. Bowyer, Combination of multiple classifiers using local accuracy estimates, IEEE Trans. Pattern Analysis Machine Intell. 19 (1997), no. 4, 405-410. https://doi.org/10.1109/34.588027
  30. G. Giacinto and F. Roli, A theoretical framework for dynamic classifier selection, in Proc. Int. conf. Pattern, Recogn. (Barcelona, Spain), Sept. 2000, pp. 8-11.
  31. A. H. R. Ko, R. Sabourin, and A. S. Britto, From dynamic classifier selection to dynamic ensemble selection, Pattern Recogn. 41 (2008), no. 5, 1735-1748.
  32. R. M. O. Cruz, R. Sabourin, and G. D. Cavalcanti, Dynamic classifier selection: recent advances and perspectives, Inf. Fusion 41 (2018), 195-216. https://doi.org/10.1016/j.inffus.2017.09.010
  33. H. V. Nguyen, H. H. Ang, and V. Gopalkrishnan, Mining outliers with ensemble of heterogeneous detectors on random subspaces, in Proc. Int. Conf. Database Syst. Adv. Appicat. (Tsukuba, Japan), 2010, pp. 368-383.
  34. A. Zimek, R. J. G. B. Campello, and J. Sander, Ensembles for unsupervised outlier detection: Challenges and research questions, ACM SIGKDD Explorations 15 (2014), no. 1, 11-22. https://doi.org/10.1145/2594473.2594476
  35. A. Lazarevic and V. Kumar, Feature bagging for outlier detection, in Proc. ACM SIGKDD Int. Conf. Knowledge Discovery data Mining (Chicago, IL, USA), Aug. 2005, pp. 157-166.
  36. B. Micenkova, B. McWilliams, and I. Assent, Learning representations for outlier detection on a budget (BORE), arXiv Preprint: 1507.08104, 2015.
  37. L. Breiman, Bagging predictors, Machine Learn. 24 (1996), no. 2, 123-140. https://doi.org/10.1007/BF00058655
  38. Y. Freund and R. E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, J. Comput. Syst. Sci. 55 (1997), no. 1, 119-139. https://doi.org/10.1006/jcss.1997.1504
  39. D. H. Wolpert, Stacked generalization, Neural Netw. 5 (1992), no. 2, 241-259. https://doi.org/10.1016/S0893-6080(05)80023-1
  40. C. C. Aggarwal and S. Sathe, Outlier ensembles: An introduction, Springer, New York, NY, USA, 2017.
  41. E. Schubert et al., On evaluation of outlier rankings and outlier scores, in Proc. SIAM Int. Conf. Data Mining (Anaheim, CA, USA), 2012, pp. 1047-1058.
  42. A. Klementiev, D. Roth, and K. Small, An unsupervised learning algorithm for rank aggregation, in Proc. Eur. Conf. Machine Learn. (Warsaw, Poland), 2007, pp. 616-623.
  43. ODDS Library, 2016, [Available from: http://odds.cs.stonybrook.edu. last accessed December 2019].
  44. E. M. Knorr and R. T. Ng, Algorithms for mining distance-based outliers in large dataset, in Proc. Int. Conf. Very Large Data Bases (New York, NY, USA), 1998, pp. 392-403.
  45. J. Zhang, Towards outlier detection for high-dimensional data streams using projected outlier analysis strategy, Dissertation, Dalhousie University, Halifax, Canada, 2008.
  46. H. P. Kriegel et al., Outlier detection in axis-parallel subspaces of high dimensional data, in Proc. Pacific-Asia Conf. Knowledge Discovery Data Mining (Bangkok, Thailand), 2009, pp. 831-838.
  47. J. Demsar, Statistical comparisons of classifiers over multiple data sets, J. Machine Learn. Research 7 (2006), 1-30.