Browse > Article
http://dx.doi.org/10.3837/tiis.2022.12.006

Tri-training algorithm based on cross entropy and K-nearest neighbors for network intrusion detection  

Zhao, Jia (Nanchang Institute of Technology, School of Information Engineering)
Li, Song (Nanchang Institute of Technology, School of Information Engineering)
Wu, Runxiu (Nanchang Institute of Technology, School of Information Engineering)
Zhang, Yiying (College of artificial intelligence, Tianjin University of Science & Technology)
Zhang, Bo (State grid smart grid research institute co., ltd)
Han, Longzhe (Nanchang Institute of Technology, School of Information Engineering)
Publication Information
KSII Transactions on Internet and Information Systems (TIIS) / v.16, no.12, 2022 , pp. 3889-3903 More about this Journal
Abstract
To address the problem of low detection accuracy due to training noise caused by mislabeling when Tri-training for network intrusion detection (NID), we propose a Tri-training algorithm based on cross entropy and K-nearest neighbors (TCK) for network intrusion detection. The proposed algorithm uses cross-entropy to replace the classification error rate to better identify the difference between the practical and predicted distributions of the model and reduce the prediction bias of mislabeled data to unlabeled data; K-nearest neighbors are used to remove the mislabeled data and reduce the number of mislabeled data. In order to verify the effectiveness of the algorithm proposed in this paper, experiments were conducted on 12 UCI datasets and NSL-KDD network intrusion datasets, and four indexes including accuracy, recall, F-measure and precision were used for comparison. The experimental results revealed that the TCK has superior performance than the conventional Tri-training algorithms and the Tri-training algorithms using only cross-entropy or K-nearest neighbor strategy.
Keywords
network intrusion detection (NID); semi-supervised learning; Tri-training; cross entropy; K-nearest neighbors;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 D. M. Li, J. W Mao, S. Fuke, "A Novel Semi-supervised Adaboost Technique Based on Improved Tri-training," in Proc. of Australasian Conference on Information Security and Privacy(ACISP 2019), Cham, GERMANY, pp. 669-678, 2019.
2 J. W. Liu, Y. Liu, X. L. Luo, "Semi-supervised learning methods," Chinese Journal of Computers, vol. 38, no. 8, pp. 1592-1617, 2015.
3 Z. H. Zhou, "Disagreement-based Semi-supervised learning," Acta Automatica Sinica, vol. 39, no. 11, pp. 1871-1878, 2013.   DOI
4 R. A. Fisher, "The use of multiple measurements in taxonomic problems," Annals of eugenics, vol. 7, no. 2, pp. 179-188, Sep. 1936.   DOI
5 A. Blum, S. Chawla, "Learning from labeled and unlabeled data using graph mincuts," in Proc. of the 8th international conference on Machine learning (ICML 2001), San Francisco, CA, USA, pp. 19-26, 2001.
6 A. Blum and T. Mitchell, "Combining labeled and unlabeled data with co-training," in Proc. of the eleventh annual conference on Computational learning theory (COLT 1998), New York, NY, USA, pp. 92-100, Jul. 1998.
7 Y. Q. Hu, Q. Y. Qiu, X. Yu, "Semi-supervised patent text classification method based on improved Tri-training algorithm," Journal of Zhejiang University (Engineering Science), vol. 54, no. 2, pp. 331-339, 2020.
8 J. W. Mo, P. Jia, "Semi-supervised classification model based on ladder network and improved tritraining," Acta Automatica Sinica, vol. 48(08), 2022.
9 Y. F. Li, D. M. Liang, "Safe semi-supervised learning: a brief introduction," Frontiers of Computer Science, vol. 13, no. 4, pp. 669-676, Jun. 2019.   DOI
10 S. Kullback, R. A. Leibler, "On information and sufficiency," The annals of mathematical statistics, vol. 22, no. 1, pp. 79-86, Mar. 1951.   DOI
11 I. Goodfellow, Y. Bengio, A. Courville, Deep learning. Massachusetts, USA : MIT press, 2016.
12 T. Cover and P. Hart, "Nearest neighbor pattern classification," IEEE Transactions on Information Theory, vol. 13, no. 1, pp. 21-27, Jan. 1967.   DOI
13 D. Dua, C, Graff, UCI Machine Learning Repository. [Online]. Available: http://archive.ics.uci.edu/ml
14 W. H. Luo, C. D. Xu, "Network Intrusion Detection Based on Improved MajorClust Clustering," Netinfo Security, vol. 20, no. 2, pp. 14-21, 2020.
15 J. Zhao, D. D. Chen, R. B. Xiao, Z. H. Cui, H. Wang and I. Lee, "Multi-strategy ensemble firefly algorithm with equilibrium of convergence and diversity," Applied Soft Computing, vol. 123, no. 1, pp. 108938, Jul. 2022.   DOI
16 H. S. Wu and R. B. Xiao, "Flexible wolf pack algorithm for dynamic multidimensional knapsack problems," Research, vol. 2020, pp. 1762107, Feb. 2020.
17 R. Y. Rubinstein, "Optimization of computer simulation models with rare events," European Journal of Operational Research, vol. 99, no. 1, pp. 89-112, May. 1997.   DOI
18 G. Too, X. J. Cheng, F. B. Qin, "Incremental clustering algorithm via cross-entropy," Journal of Systems Engineering and Electronics, vol. 16, no. 4, pp. 781-786, Dec. 2005.   DOI
19 B. Santosa, "Application of the Cross-Entropy Method to Dual Lagrange Support Vector Machine," in Proc. of the 5th International Conference on Advanced Data Mining and Applications(ADMA 2009), Beijing, CHINA, pp. 595-602, 2009.
20 H. Xiao, F. Sun, Y. Liang, "A Fast Incremental Learning Algorithm for SVM Based on K Nearest Neighbors," in Proc. of 2010 International Conference on Artificial Intelligence and Computational Intelligence(ICCAI 2010), Sanya, China, pp. 413-416, 2010.
21 C. Ren, L. Sun, Y. Yu and Q. Wu, "Effective Density Peaks Clustering Algorithm Based on the Layered K-Nearest Neighbors and Subcluster Merging," IEEE Access, vol. 8, pp. 123449-123468, Jun. 2020.   DOI
22 J Zhao, Z. F. Yao, L. Lv, T. H. Fan, "Density peaks clustering based on mutual neighbor degree," Control and Decision, vol. 36, no. 3, pp. 543-552, Mar. 2021.
23 J. S. Sanchez, R. Barandela, A. I. Marques, et al, "Analysis of new techniques to obtain quality training sets," Pattern Recognition Letters, vol. 24, no. 7, pp. 1015-1022, Apr. 2003.   DOI
24 L. Lv, J. Y. Wang, R. X. Wu, H. Wang, I. Lee, "Density peaks clustering based on geodetic distance and dynamic neighborhood," International Journal of Bio-Inspired Computation, vol. 17, no. 1, pp. 24-33, Feb. 2021.   DOI
25 R. X. Wu, S. H. Yin, J. Zhao, P. W. Li, B. H. Liu, "Density Peaks Clustering based on Relative Density Estimating and Multi Cluster Merging," Control and Decision, 2022.
26 D. J. C. MacKay, Information theory, inference and learning algorithms, Cambridge, UK: Cambridge university press, 2003.
27 H. S. Wu, J. J. Xue, R. B. Xiao and J. Q. Hu, "Uncertain bilevel knapsack problem based on improved binary wolf pack algorithm," Frontiers of Information Technology & Electronic Engineering, vol. 21, no. 9, pp. 1356-1368, Jun. 2020.   DOI
28 J. Zhao, L. Lv, H. Wang, H. Sun, R. X. Wu and Z. F. Xie, "Particle Swarm Optimization based on Vector Gaussian Learning," KSII Transactions on Internet and Information Systems, vol. 11, no. 4, pp. 2038-2057, Apr. 2017.   DOI
29 L. Lv, X. D. Zhou, P. Kang, X. F. Fu, X. M. Tian, "Multi-Objective Firefly Algorithm with Hierarchical Learning," Journal of Network Intelligence, vol. 6, no. 3, pp. 411-427, Aug. 2021.
30 J. Zhao, W. P. Chen, R. B. Xiao, J. Ye, "Firefly algorithm with division of roles for complex optimal scheduling," Frontiers of Information Technology & Electronic Engineering, vol. 22, no. 10, pp. 1311-1333, Oct. 2021.   DOI
31 D. Angluin, P. Laird, "Learning from noisy examples," Machine Learning, vol. 2, no. 4, pp. 343-370, Apr. 1988.   DOI
32 S. Y. Wu, J. Yu, X. P. Fan, "Intrusion Detection Algorithm Based on Tri-training," Computer Engineering, vol. 38, no. 6, pp. 158-160, 2012.   DOI
33 O. Chapelle, B. Scholkopf and A. Eds, "Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006) [Book reviews]," IEEE Transactions on Neural Networks, vol. 20, no. 3, pp. 542-542, Mar. 2009.
34 D. J. Miller, H. S. Uyar, "A mixture of experts classifier with learning based on both labelled and unlabelled data," in Proc. of the 9th International Conference on Neural Information Processing Systems (NIPS 1996), Cambridge, MA, USA, pp. 571-577, 1996.
35 Z. H. Zhou and M. Li, "Tri-training: exploiting unlabeled data using three classifiers," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 11, pp. 1529-1541, Nov. 2005.   DOI
36 Y. Zhang, R. R. Chen, J. Zhang, "Safe tri-training algorithm based on cross entropy," Journal of Computer Research and Development, vol. 58, no. 1, pp. 60-69, 2021.
37 H. Liu, Z. Liu, W. Jia, D. Zhang and J. Tan, "A Novel Imbalanced Data Classification Method Based on Weakly Supervised Learning for Fault Diagnosis," IEEE Transactions on Industrial Informatics, vol. 18, no. 3, pp. 1583-1593, Mar. 2022.   DOI
38 Z. W. Wang, S. K. Wang, B. T. Wan, "A novel multi-label classification algorithm based on Knearest neighbor and random walk," International Journal of Distributed Sensor Networks, vol. 16, no. 3, Mar. 2020.