DOI QR코드

DOI QR Code

Imbalanced SVM-Based Anomaly Detection Algorithm for Imbalanced Training Datasets

  • Wang, GuiPing (College of Information Science and Engineering, Chongqing Jiaotong University) ;
  • Yang, JianXi (College of Information Science and Engineering, Chongqing Jiaotong University) ;
  • Li, Ren (College of Information Science and Engineering, Chongqing Jiaotong University)
  • Received : 2016.11.30
  • Accepted : 2017.07.03
  • Published : 2017.10.01

Abstract

Abnormal samples are usually difficult to obtain in production systems, resulting in imbalanced training sample sets. Namely, the number of positive samples is far less than the number of negative samples. Traditional Support Vector Machine (SVM)-based anomaly detection algorithms perform poorly for highly imbalanced datasets: the learned classification hyperplane skews toward the positive samples, resulting in a high false-negative rate. This article proposes a new imbalanced SVM (termed ImSVM)-based anomaly detection algorithm, which assigns a different weight for each positive support vector in the decision function. ImSVM adjusts the learned classification hyperplane to make the decision function achieve a maximum GMean measure value on the dataset. The above problem is converted into an unconstrained optimization problem to search the optimal weight vector. Experiments are carried out on both Cloud datasets and Knowledge Discovery and Data Mining datasets to evaluate ImSVM. Highly imbalanced training sample sets are constructed. The experimental results show that ImSVM outperforms over-sampling techniques and several existing imbalanced SVM-based techniques.

Keywords

References

  1. V. Chandola, A. Banerjee, and V. Kumar, "Anomaly Detection: A Survey," ACM Comput. Surv., vol. 41, no. 3, July 2009, pp. 15:1-15:58.
  2. H. Lee et al., "Anomaly Intrusion Detection Based on Hyper-Ellipsoid in the Kernel Feature Space," KSII Trans. Internet Inform. Syst., vol. 9, no. 3, 2015, pp. 1173-1192. https://doi.org/10.3837/tiis.2015.03.019
  3. J.C. Liu et al., "Anomaly Detection Using LibSVM Training Tools," Int. J. Secur. Appl., vol. 2, no. 4, 2008, pp. 89-98.
  4. M. Hejazi and Y.P. Singh, "One-Class Support Vector Machines Approach to Anomaly Detection," Appl. Artif. Intell., vol. 27, no. 5, 2013, pp. 351-366. https://doi.org/10.1080/08839514.2013.785791
  5. D. Li, S.L. Liu, and H.L. Zhang, "A Method of Anomaly Detection and Fault Diagnosis with Online Adaptive Learning Under Small Training Samples," Pattern Recogn., vol. 64, 2017, pp. 374-385. https://doi.org/10.1016/j.patcog.2016.11.026
  6. S. Fu, J.G. Liu, and H. Pannu, "A Hybrid Anomaly Detection Framework in Cloud Computing Using One-Class and Two-Class Support Vector Machines," Proc. Int. Conf. Adv. Data Mining Applicat., Nanjing, China, Dec. 15-18, 2012, pp. 726-738.
  7. S.K. Tezel and L.J. Latecki, "Improving SVM Classification on Imbalanced Time Series Data Sets With Ghost Points," Knowl. Inf. Syst., vol. 28, no. 1, Jan. 2011, pp. 1-23. https://doi.org/10.1007/s10115-010-0310-3
  8. N.V. Chawla et al., "SMOTE: Synthetic Minority Over-Sampling Technique," J. Artif. Intell. Res., vol. 16, June 2002, pp. 321-357. https://doi.org/10.1613/jair.953
  9. Z.M. Yang, L.Y. Qiao, and X.Y. Peng, "Research on Data Mining Method for Imbalanced Dataset Based on Improved SMOTE," Acta Electronica Sinica, vol. 36, no. s2, Dec. 2007, pp. 22-26.
  10. C.L. Castro, M.A. Carvalho, and A.P. Braga, "An Improved Algorithm for SVMs Classification of Imbalanced Data Sets," Proc. Int. Conf. Eng. Appicat. Neural Netw., London, UK, Aug. 7-29, 2009, pp. 108-118.
  11. T. Imam, K.M. Ting, and J. Kamruzzaman, "z-SVM: An SVM for Improved Classification of Imbalanced Data," Proc. Australian Joint Conf. Artif. Intell., Hobart, Australia, Dec. 4-8, 2006, pp. 264-273.
  12. Z.M. Yang and X.Y. Peng, "lSVM - A New Method for Solving the Problem of Imbalanced Dataset Classification," Chin. J. Sci. Instrum., vol. 29, no. S2, Aug. 2008, pp. 117-122.
  13. N. Thai-Nghe, Z. Gantner, and L. Schmidt-Thieme, "Cost-Sensitive Learning Methods for Imbalanced Data," Proc. Int. Joint Conf. neural Netw., Barocelona, Spain, July 18- 23, 2010, pp. 1-8.
  14. Y. Zhang et al., "Imbalanced Data Classification Based on Scaling Kernel-Based Support Vector Machine," Neural Comput. Appl., vol. 25, no. 3-4, Apr. 2014, pp. 927-935. https://doi.org/10.1007/s00521-014-1584-2
  15. A. Maratea, A. Petrosino, and M. Manzo, "Adjusted FMeasure and Kernel Scaling for Imbalanced Data Learning," Inform. Sci., vol. 257, Feb. 2014, pp. 331-341. https://doi.org/10.1016/j.ins.2013.04.016
  16. M. Claesen et al., "EnsembleSVM: A Library for Ensemble Learning Using Support Vector Machines," J. Mach. Learn. Res., vol. 15, Jan. 2014, pp. 141-145.
  17. R. Akbani, S. Kwek, and N. Japkowicz, "Applying Support Vector Machines to Imbalanced Datasets," Proc. Eur. Conf. Mach. Learning, Pisa, Italy, Sept. 20-24, 2004, pp. 39-50.
  18. M. Kubat and S. Matwin, "Addressing the Curse of Imbalanced Training Sets: One-Sided Selection," Proc. Int. Conf. Mach. Learning, 1997, pp. 179-186.
  19. C. Cortes and V.N. Vapnik, "Support-Vector Networks," Mach. Learn., vol. 20, no. 3, 1995, pp. 273-279. https://doi.org/10.1007/BF00994018
  20. A. Sun, E.-P. Lim, and Y. Liu, "On Strategies for Imbalanced Text Classification Using SVM: A Comparative Study," Decis. Support Syst., vol. 48, no. 1, 2009, pp. 191-201. https://doi.org/10.1016/j.dss.2009.07.011
  21. P. Branco, L. Torgo, and R.P. Ribeiro, "A Survey of Predictive Modeling on Imbalanced Domains," ACM Comput. Surv., vol. 49, no. 2, 2016, pp. 31:1-31:50.
  22. N. Cristianini and J. Shawe-Taylor, "An Introduction to Support Vector Machines and Other Kernel-Based Learning Methode," New York, USA: Cambridge University Press, 2000.
  23. J. Nocedal and S.J. Wright, "Numerical Optimization", 2nd edn. New York, Springer, 2006.
  24. S. Hettich and S.D. Bay, "The UCI KDD Archive," 1999, Accessed 2016. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
  25. R.P. Lippmann et al., "Evaluating Intrusion Detection Systems: The 1998 DARPA Off-Line Intrusion Detection Evaluation," Proc. DARPA Inform. Survivability Conf. Expo., Hilton Head, SC, USA, 25-27, Jan. 2000, pp. 12-26.

Cited by

  1. Keyed learning: An adversarial learning framework-formalization, challenges, and anomaly detection applications vol.41, pp.5, 2017, https://doi.org/10.4218/etrij.2019-0140