[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.7472/jksii.2017.18.6.35

A Classification Model for Attack Mail Detection based on the Authorship Analysis

Hong, Sung-Sam (Department of Computer Engineering, Gachon University)
Shin, Gun-Yoon (Department of Computer Engineering, Gachon University)
Han, Myung-Mook (Department of Computer Engineering, Gachon University)

Publication Information

Journal of Internet Computing and Services / v.18, no.6, 2017 , pp. 35-46 More about this Journal

Abstract

Recently, attackers using malicious code in cyber security have been increased by attaching malicious code to a mail and inducing the user to execute it. Especially, it is dangerous because it is easy to execute by attaching a document type file. The author analysis is a research area that is being studied in NLP (Neutral Language Process) and text mining, and it studies methods of analyzing authors by analyzing text sentences, texts, and documents in a specific language. In case of attack mail, it is created by the attacker. Therefore, by analyzing the contents of the mail and the attached document file and identifying the corresponding author, it is possible to discover more distinctive features from the normal mail and improve the detection accuracy. In this pager, we proposed IADA2(Intelligent Attack mail Detection based on Authorship Analysis) model for attack mail detection. The feature vector that can classify and detect attack mail from the features used in the existing machine learning based spam detection model and the features used in the author analysis of the document and the IADA2 detection model. We have improved the detection models of attack mails by simply detecting term features and extracted features that reflect the sequence characteristics of words by applying n-grams. Result of experiment show that the proposed method improves performance according to feature combinations, feature selection techniques, and appropriate models.

Keywords

Text Mining; Machine Learning; Classification; Authorship Analysis; Attacker Identification;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)

Reference
Cited By KSCI

1	Ho, Tin Kam "Random Decision Forests," Proceedings of the 3rd International Conference on Document Analysis and Recognition, pp. 278-282, 1995 https://doi.org/10.1109/icdar.1995.598994
2	Rosenblatt, Frank. x. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan Books, Washington DC, 1961
3	Monowar H. Bhuyan, D. K. Bhattacharyya, and J. K. Kalita, "Network Anomaly Detection: Methods, Systems and Tools," IEEE Communications Surveys & Tutorials, Vol.16, No.1, pp.303-336, 2014 https://doi.org/10.1109/surv.2013.052213.00046 DOI
4	Rocha, Anderson, et al. "Authorship attribution for social media forensics." IEEE Transactions on Information Forensics and Security, Vol.12, No.1, pp.5-33, 2017 https://doi.org/10.1109/tifs.2016.2603960 DOI
5	Alsulami, Bander, et al. "Source Code Authorship Attribution Using Long Short-Term Memory Based Networks." European Symposium on Research in Computer Security, 2017 https://doi.org/10.1007/978-3-319-66402-6_6
6	Singh, Shashi Pal, et al. "Intelligent Text Mining Model for English Language Using Deep Neural Network." International Conference on Information and Communication Technology for Intelligent Systems, Springer, 2017 https://doi.org/10.1007/978-3-319-63645-0_54
7	Hong, Sung-Sam, Jong-Hwan Kong, and Myung-Mook Han. "The Adaptive SPAM Mail Detection System using Clustering based on Text Mining." KSII Transactions on Internet and Information Systems (TIIS), vol.8, no.6, pp.2186-2196, 2014 https://doi.org/10.3837/tiis.2014.06.022 DOI
8	Rong Zheng, Jiexun Li, Hsinchun Chen, and Zan Huang, "A Framework for Authorship Identification of Online Messages: Writing-Style Features and Classification Techniques," Journal of the Association for Information Science and Technology, vol.57, no.3, pp.378-393, 2006 https://doi.org/10.1002/asi.20316
9	Nir Nissim, Aviad Cohen, and Yuval Elovici, "ALDOCX: Detection of Unknown Malicious Microsoft Office Documents Using Designated Active Learning Methods Based on New Structural Feature Extraction Methodology," IEEE Transactions on Information Forensics and Security, vol.12, no.3, pp.631-646, 2017 https://doi.org/10.1109/tifs.2016.2631905 DOI
10	Nathan Rosenblum, Xiaojin Zhu, Barton P. Miller, "Who Wrote This Code? Identifying the Authors of Program Binaries," Proceedings of the 16th European conference on Research in computer security, pp.172-189, 2011 https://doi.org/10.1007/978-3-642-23822-2_10
11	Ruan, Guangchen, and Ying Tan. "A three-layer back-propagation neural network for spam detection using artificial immune concentration." Soft computing, vol.14, no.2, pp.139-150, 2010 https://doi.org/10.1007/s00500-009-0440-2 DOI
12	Shih, Dong-Her, Hsiu-Sen Chiang, and C. David Yen. "Classification methods in the detection of new malicious emails." Information Sciences, vol.172, no.1, pp.241-261, 2005 https://doi.org/10.1016/j.ins.2004.06.003 DOI
13	Al-Shboul, Bashar Awad, et al. "Voting-based classification for e-mail spam detection." Journal of ICT Research and Applications, vol.10, no.1, pp.26-42, 2016 https://doi.org/10.1016/j.comnet.2008.11.012
14	https://www.python.org/
15	De Vel, Olivier. "Mining e-mail authorship." Proceeding of Workshop on Text Mining, ACM International Conference on Knowledge Discovery and Data Mining (KDD'2000), 2000 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.36.6277
16	Alsmadi, Izzat, and Ikdam Alhami. "Clustering and classification of email contents." Journal of King Saud University-Computer and Information Sciences vol.27, no.1, pp.46-57, 2015 https://doi.org/10.1016/j.jksuci.2014.03.014 DOI
17	Ahmed Abbasi and Hsinchun Chen, "Applying Authorship Analysis to Extremist-Group Web Forum Messages," IEEE Intelligent Systems, vol.20, no.5, pp.67-75, 2005 https://doi.org/10.1109/mis.2005.81
18	Smutz, Charles, and Angelos Stavrou. "Malicious PDF detection using metadata and structural features." Proceedings of the 28th annual computer security applications conference. ACM, 2012 https://doi.org/10.1145/2420950.2420987
19	Digital Bread Crumbs, Focusing Seven Clues To Identifying Who's Behind Advanced Cyber Attack, FireEye Report, RPT.DB.EN-US.082014, 2014
20	http://scikit-learn.org/stable/
21	K. Bache and M. Lichman, "UCI machine learning repository," 2013.
22	Vapnik, V., The nature of statistical learning theory. Springer-Verlag New York, 2000
23	Altman, N. S., "An introduction to kernel and nearestneighbor nonparametric regression." The American Statistician, vol.46, no.3, pp.175-185, 1992 https://doi.org/10.2307/2685209
24	Kaminski, B.; Jakubczyk, M.; Szufel, P. "A framework for sensitivity analysis of decision trees". Central European Journal of Operations Research, 2017 https://doi.org/10.4135/9781412971980.n103

KSCI

A Classification Model for Attack Mail Detection based on the Authorship Analysis 작성자 분석 기반의 공격 메일 탐지를 위한 분류 모델

A Classification Model for Attack Mail Detection based on the Authorship Analysis