Browse > Article
http://dx.doi.org/10.7472/jksii.2019.20.6.11

An Efficient BotNet Detection Scheme Exploiting Word2Vec and Accelerated Hierarchical Density-based Clustering  

Lee, Taeil (Dept. of Computer Science and Information Engineering, Korea National University of Transportation)
Kim, Kwanhyun (Dept. of Computer Science and Information Engineering, Korea National University of Transportation)
Lee, Jihyun (Dept. of Computer Science and Information Engineering, Korea National University of Transportation)
Lee, Suchul (Dept. of Computer Science and Information Engineering, Korea National University of Transportation)
Publication Information
Journal of Internet Computing and Services / v.20, no.6, 2019 , pp. 11-20 More about this Journal
Abstract
Numerous enterprises, organizations and individual users are exposed to large DDoS (Distributed Denial of Service) attacks. DDoS attacks are performed through a BotNet, which is composed of a number of computers infected with a malware, e.g., zombie PCs and a special computer that controls the zombie PCs within a hierarchical chain of a command system. In order to detect a malware, a malware detection software or a vaccine program must identify the malware signature through an in-depth analysis, and these signatures need to be updated in priori. This is time consuming and costly. In this paper, we propose a botnet detection scheme that does not require a periodic signature update using an artificial neural network model. The proposed scheme exploits Word2Vec and accelerated hierarchical density-based clustering. Botnet detection performance of the proposed method was evaluated using the CTU-13 dataset. The experimental result shows that the detection rate is 99.9%, which outperforms the conventional method.
Keywords
BotNet Detection; Word2Vec; Clustering; Skip-gram;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 H. G. Kim et al. "Visualization of Malwares for Classification Through Deep Learning," Journal of Internet Computing and Services(JICS), 19(5), pp. 67-75, Oct. 2018. http://dx.doi.org/10.7472/jksii.2018.19.5.67   DOI
2 E. Hodo et al, "Shallow and deep networks intrusion detection system: A taxonomy and survey", arXiv preprint arXiv:1701.02145 https://arxiv.org/abs/1701.02145
3 S. Ryu et al. A Comparative Study of Machine Learning Algorithms and Their Ensembles for Botnet Detection. Journal of Computer and Communications, 6(5), 119-129, 2018. https://dx.doi.org/10.4236/jcc.2018.65010   DOI
4 Vasiliadis et al, "MIDeA: a multi-parallel intrusion detection architecture," In ACM conference on Computer and communications security (CCS) 2011. https://dl.acm.org/citation.cfm?id=2046741
5 Song Yangqiu et al, "Unsupervised sparse vector densification for short text similarity," Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2015. https://aclweb.org/anthology/N15-1138
6 M. Tomas et al. "Distributed representations of words and phrases and their compositionality." Advances in Neural Information Processing Systems (NIPS) 2013. https://dl.acm.org/citation.cfm?id=2999959
7 R. S. M. Carrasco et al, "Unsupervised intrusion detection through skip-gram models of network behavior." Computers & Security 78 (2018): 187-197. https://doi.org/10.1016/j.cose.2018.07.003   DOI
8 Popov, I. "Malware detection using machine learning based on Word2Vec embeddings of machine code instructions" Siberian Symposium on Data Science and Engineering 2017. https://ieeexplore.ieee.org/document/8071952
9 S. Garcia, M. Grill, "An empirical comparison of botnet detection methods," Computers & Security, vol. 45, pp. 100-123, 2014. https://doi.org/10.1016/j.cose.2014.05.011   DOI
10 S. Lee et al., "NeTraMark: a network traffic classification benchmark," ACM SIGCOMM Computer Communication Review 41.1, 22-30, 2011. http://doi.acm.org/10.1145/1925861.1925865   DOI
11 K. C. Claffy et al, "A parameterizable methodology for Internet traffic flow profiling." IEEE Journal on selected areas in communications, 13.8, 1481-1494, 1995. https://doi.org/10.1109/49.464717   DOI
12 P. Sethi et al, "Internet of things: architectures, protocols, and applications," Journal of Electrical and Computer Engineering, 2017. https://doi.org/10.1155/2017/9324035
13 L. Yang et al. "Topical word embeddings," Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.703.7444&rep=rep1&type=pdf
14 X. Rong, "Word2Vec parameter learning explained," arXiv:1411.2738, 2014. https://arxiv.org/abs/1411.2738
15 Mnih, Andriy, and Koray Kavukcuoglu. "Learning word embeddings efficiently with noise-contrastive estimation." Advances in Neural Information Processing Systems, 2013. (NIPS 2013) http://papers.nips.cc/paper/5165-learning-word-embeddings-efficiently-with
16 S. Frank et al. "Feature engineering in context-dependent deep neural networks for conversational speech transcription," 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011. https://doi.org/10.1109/ASRU.2011.6163899
17 A. Kilgarriff. "Thesauruses for natural language processing." International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings, IEEE, 2003. https://doi.org/10.1109/NLPKE.2003.1275859
18 L. Maaten et al, "Visualizing data using t-SNE." Journal of machine learning research, 9, 2579-2605, Nov. 2008. http://www.jmlr.org/papers/v9/vandermaaten08a.html
19 L. Maaten et al, "Visualizing data using t-SNE." Journal of machine learning research 9.Nov (2008): 2579-2605. http://www.jmlr.org/papers/v9/vandermaaten08a.html
20 E. Martin et al. "A density-based algorithm for discovering clusters in large spatial databases with noise," Kdd. Vol. 96. No. 34. 1996. https://dl.acm.org/citation.cfm?id=3001507
21 S. Lee et al, "LARGen: Automatic Signature Generation for Malwares Using Latent Dirichlet Allocation," IEEE Transactions on Dependable and Secure Computing (TDSC) Vol.15(5), pp. 771-783, 2018. https://doi.org/10.1109/TDSC.2016.2609907   DOI
22 G. Salton et al, "A Vector space model for automatic indexing." Communications of the ACM, Vol.18(11), pp. 613-620, 1975. https://doi.org/10.1145/361219.361220   DOI
23 D. Scott, et al. "Indexing by latent semantic analysis," Journal of the American society for information science 41.6, 391-407, 1990. https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9   DOI
24 H. Thomas. "Probabilistic latent semantic analysis," Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., 1999. https://dl.acm.org/citation.cfm?id=2073829
25 T. N. Rubin et al, "Statistical topic models for multi-label document classification," Machine Learning, Vol.88 (1-2), pp. 157-208, 2012. https://doi.org/10.1007/s10994-011-5272-5   DOI