Browse > Article
http://dx.doi.org/10.7472/jksii.2017.18.6.35

A Classification Model for Attack Mail Detection based on the Authorship Analysis  

Hong, Sung-Sam (Department of Computer Engineering, Gachon University)
Shin, Gun-Yoon (Department of Computer Engineering, Gachon University)
Han, Myung-Mook (Department of Computer Engineering, Gachon University)
Publication Information
Journal of Internet Computing and Services / v.18, no.6, 2017 , pp. 35-46 More about this Journal
Abstract
Recently, attackers using malicious code in cyber security have been increased by attaching malicious code to a mail and inducing the user to execute it. Especially, it is dangerous because it is easy to execute by attaching a document type file. The author analysis is a research area that is being studied in NLP (Neutral Language Process) and text mining, and it studies methods of analyzing authors by analyzing text sentences, texts, and documents in a specific language. In case of attack mail, it is created by the attacker. Therefore, by analyzing the contents of the mail and the attached document file and identifying the corresponding author, it is possible to discover more distinctive features from the normal mail and improve the detection accuracy. In this pager, we proposed IADA2(Intelligent Attack mail Detection based on Authorship Analysis) model for attack mail detection. The feature vector that can classify and detect attack mail from the features used in the existing machine learning based spam detection model and the features used in the author analysis of the document and the IADA2 detection model. We have improved the detection models of attack mails by simply detecting term features and extracted features that reflect the sequence characteristics of words by applying n-grams. Result of experiment show that the proposed method improves performance according to feature combinations, feature selection techniques, and appropriate models.
Keywords
Text Mining; Machine Learning; Classification; Authorship Analysis; Attacker Identification;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Ho, Tin Kam "Random Decision Forests," Proceedings of the 3rd International Conference on Document Analysis and Recognition, pp. 278-282, 1995 https://doi.org/10.1109/icdar.1995.598994
2 Rosenblatt, Frank. x. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan Books, Washington DC, 1961
3 Monowar H. Bhuyan, D. K. Bhattacharyya, and J. K. Kalita, "Network Anomaly Detection: Methods, Systems and Tools," IEEE Communications Surveys & Tutorials, Vol.16, No.1, pp.303-336, 2014 https://doi.org/10.1109/surv.2013.052213.00046   DOI
4 Rocha, Anderson, et al. "Authorship attribution for social media forensics." IEEE Transactions on Information Forensics and Security, Vol.12, No.1, pp.5-33, 2017 https://doi.org/10.1109/tifs.2016.2603960   DOI
5 Alsulami, Bander, et al. "Source Code Authorship Attribution Using Long Short-Term Memory Based Networks." European Symposium on Research in Computer Security, 2017 https://doi.org/10.1007/978-3-319-66402-6_6
6 Singh, Shashi Pal, et al. "Intelligent Text Mining Model for English Language Using Deep Neural Network." International Conference on Information and Communication Technology for Intelligent Systems, Springer, 2017 https://doi.org/10.1007/978-3-319-63645-0_54
7 Hong, Sung-Sam, Jong-Hwan Kong, and Myung-Mook Han. "The Adaptive SPAM Mail Detection System using Clustering based on Text Mining." KSII Transactions on Internet and Information Systems (TIIS), vol.8, no.6, pp.2186-2196, 2014 https://doi.org/10.3837/tiis.2014.06.022   DOI
8 Rong Zheng, Jiexun Li, Hsinchun Chen, and Zan Huang, "A Framework for Authorship Identification of Online Messages: Writing-Style Features and Classification Techniques," Journal of the Association for Information Science and Technology, vol.57, no.3, pp.378-393, 2006 https://doi.org/10.1002/asi.20316
9 Nir Nissim, Aviad Cohen, and Yuval Elovici, "ALDOCX: Detection of Unknown Malicious Microsoft Office Documents Using Designated Active Learning Methods Based on New Structural Feature Extraction Methodology," IEEE Transactions on Information Forensics and Security, vol.12, no.3, pp.631-646, 2017 https://doi.org/10.1109/tifs.2016.2631905   DOI
10 Nathan Rosenblum, Xiaojin Zhu, Barton P. Miller, "Who Wrote This Code? Identifying the Authors of Program Binaries," Proceedings of the 16th European conference on Research in computer security, pp.172-189, 2011 https://doi.org/10.1007/978-3-642-23822-2_10
11 Ruan, Guangchen, and Ying Tan. "A three-layer back-propagation neural network for spam detection using artificial immune concentration." Soft computing, vol.14, no.2, pp.139-150, 2010 https://doi.org/10.1007/s00500-009-0440-2   DOI
12 Shih, Dong-Her, Hsiu-Sen Chiang, and C. David Yen. "Classification methods in the detection of new malicious emails." Information Sciences, vol.172, no.1, pp.241-261, 2005 https://doi.org/10.1016/j.ins.2004.06.003   DOI
13 Al-Shboul, Bashar Awad, et al. "Voting-based classification for e-mail spam detection." Journal of ICT Research and Applications, vol.10, no.1, pp.26-42, 2016 https://doi.org/10.1016/j.comnet.2008.11.012
14 https://www.python.org/
15 De Vel, Olivier. "Mining e-mail authorship." Proceeding of Workshop on Text Mining, ACM International Conference on Knowledge Discovery and Data Mining (KDD'2000), 2000 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.36.6277
16 Alsmadi, Izzat, and Ikdam Alhami. "Clustering and classification of email contents." Journal of King Saud University-Computer and Information Sciences vol.27, no.1, pp.46-57, 2015 https://doi.org/10.1016/j.jksuci.2014.03.014   DOI
17 Ahmed Abbasi and Hsinchun Chen, "Applying Authorship Analysis to Extremist-Group Web Forum Messages," IEEE Intelligent Systems, vol.20, no.5, pp.67-75, 2005 https://doi.org/10.1109/mis.2005.81
18 Smutz, Charles, and Angelos Stavrou. "Malicious PDF detection using metadata and structural features." Proceedings of the 28th annual computer security applications conference. ACM, 2012 https://doi.org/10.1145/2420950.2420987
19 Digital Bread Crumbs, Focusing Seven Clues To Identifying Who's Behind Advanced Cyber Attack, FireEye Report, RPT.DB.EN-US.082014, 2014
20 http://scikit-learn.org/stable/
21 K. Bache and M. Lichman, "UCI machine learning repository," 2013.
22 Vapnik, V., The nature of statistical learning theory. Springer-Verlag New York, 2000
23 Altman, N. S., "An introduction to kernel and nearestneighbor nonparametric regression." The American Statistician, vol.46, no.3, pp.175-185, 1992 https://doi.org/10.2307/2685209
24 Kaminski, B.; Jakubczyk, M.; Szufel, P. "A framework for sensitivity analysis of decision trees". Central European Journal of Operations Research, 2017 https://doi.org/10.4135/9781412971980.n103