Browse > Article
http://dx.doi.org/10.6109/jkiice.2019.23.5.533

Dynamic RNN-CNN malware classifier correspond with Random Dimension Input Data  

Lim, Geun-Young (Department of Information Security, Daejeon University)
Cho, Young-Bok (Department of Information Security, Daejeon University)
Abstract
This study proposes a malware classification model that can handle arbitrary length input data using the Microsoft Malware Classification Challenge dataset. We are based on imaging existing data from malware. The proposed model generates a lot of images when malware data is large, and generates a small image of small data. The generated image is learned as time series data by Dynamic RNN. The output value of the RNN is classified into malware by using only the highest weighted output by applying the Attention technique, and learning the RNN output value by Residual CNN again. Experiments on the proposed model showed a Micro-average F1 score of 92% in the validation data set. Experimental results show that the performance of a model capable of learning and classifying arbitrary length data can be verified without special feature extraction and dimension reduction.
Keywords
RNN; CNN; malware; Deep-learning; Micro-average F1 score;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 R. Dey, and F. M. Salem. "Gate-variants of gated recurrent unit (GRU) neural networks," CoRR, abs/1701.05923, 2017.
2 wiki fast .ai Logloss [Internet]. Available: http://wiki.fast.ai/index.php/Log_Loss
3 Diederik P. Kingma, Jimmy Ba, "Adam: A Method for Stochastic Optimization" in 3rd International Conference for Learning Representations, San Diego, 2015.
4 G. Y. Lim, and Y. B. Cho, "The Sentence Similarity Measure Using Deep-Learning and Char2Vec." Journal of the Korea Institute of Information and Communication Engineering, vol. 22, no. 10: 1300-1306, Oct. 2018.   DOI
5 Naver ai hackerton 2018 Team sadang solution [Internet]. Available:https://github.com/moonbings/naver-ai-hackathon-2018.
6 J.M. Kim, and J.H. Lee, "Text Document Classification Based on Recurrent Neural Network Using Word2vec," Journal of Korean Institute of Intelligent Systems, vol. 27, no. 6, pp. 560-565, Dec. 2017.   DOI
7 S. J. Park, S.M. Choi, H.J. Lee, and J.B. Kim, "Spatial analysis using R based Deep Learning," Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology, vol. 6, no. 4, pp. 1-8, Apr. 2016.
8 J.M. Kim, and J.H. Lee, "Text Document Classification Based on Recurrent Neural Network Using Word2vec," Journal of korean Institute of Intelligent System, vol. 27, no.6, pp. 560-565, Jun. 2017.   DOI
9 P. Baudis, S. Stanko, and J.Sedivy, "Joint Learning of Sentence Embeddings for Relevance and Entailment," in The Workshop on Representation Learning for NLP, Berlin, Germany, pp. 18-26, 2016.
10 J.Y. Kim, and E. H. Park, "e-Learning Course Reviews Analysis based on Big Data Analytics," Journal of the Korea Institute of Information and Communication Engineering, vol. 21, no. 2, pp. 423-428, Feb. 2017.   DOI
11 J. Mueller, and T. Aditya "Siamese Recurrent Architectures for Learning Sentence Similarity." in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Arizona, pp. 2786-2792, 2016.
12 Y. Kim, Y. Jernite, S. David, and M. R. Alexander, "Character-Aware Neural Language Models," CoRR, abs/1508.06615, 2015.