DOI QR코드

DOI QR Code

Improvement of Performance of Malware Similarity Analysis by the Sequence Alignment Technique

서열 정렬 기법을 이용한 악성코드 유사도 분석의 성능 개선

  • 조인겸 (한양대학교 컴퓨터 소프트웨어 전공) ;
  • 임을규 (한양대학교 컴퓨터공학부)
  • Received : 2014.09.05
  • Accepted : 2014.12.22
  • Published : 2015.03.15

Abstract

Malware variations could be defined as malicious executable files that have similar functions but different structures. In order to classify the variations, this paper analyzed sequence alignment, the method used in Bioinformatics. This method found common parts of the Malwares' API call information. This method's performance is dependent on the API call information's length; if the length is too long, the performance should be very poor. Therefore we removed the repeated patterns in API call information in order to improve the performance of sequence alignment analysis, before the method was applied. Finally the similarity between malware was analyzed using sequence alignment. The experimental results with the real malware samples were presented.

변종 악성코드는 그 기능에 있어 차이가 없으나 구조적인 차이가 존재하는 악성코드로, 같은 그룹으로 분류하여 처리하는 것이 유용하다. 변종 악성코드 분석을 위해 본 논문에서는 바이오인포매틱스 분야에서 사용하는 서열 정렬 기법을 사용하여 악성코드들의 API 호출 정보 간의 공통부분을 찾고자 하였다. 서열 정렬 기법은 API 호출 정보의 길이에 대해 의존적인 성능을 가지며, API 호출 정보의 길이가 커짐에 따라 성능이 매우 떨어진다. 따라서 본 논문에서는 서열 정렬 기법 적용 이전에 API 호출 정보에서 발견되는 반복 패턴을 제거하는 방법을 적용함으로써 성능이 보장될 수 있도록 하였다. 최종적으로 서열 정렬 기법을 통한 악성코드 간의 유사도를 구하는 방법에 대하여 논하였다. 또한 실제 악성코드 샘플에 대한 실험 결과를 제시하였다.

Keywords

Acknowledgement

Supported by : 한국연구재단

References

  1. M. Egele, T Scholte, E. Kirda, and C. Kruegel, "A Survey on Automated Dynamic Malware-Analysis Techniques and Tools," ACM Computing Surveys (CSUR), Vol. 44, No. 2, pp. 1-42, Feb. 2012.
  2. Y. Qiao, Y. Yang, and L. Ji, J. He, "Analyzing Malware by Abstracting the Frequent Itemsets in API Call Sequences," 2013 12th IEEE International Conference on. IEEE, pp. 265-270, 2013.
  3. D. Veerwal and P. Menaria, "Ensemble of Soft Computing Techniques for Malware detection," International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS), Vol. 6, No. 2, pp. 159-167, Sep.-Nov. 2013.
  4. K.S. Han, I. K. Kim, and E. G. Im, "Malware Classification Methods Using API Sequence Characteristics," Proc. of the International Conference on IT Convergence and Security 2011 Lecture Notes in Electrical Engineering, Vol. 120, pp. 613-626, 2012.
  5. K. S. Han, I. K. Kim, and E. G. Im, "Detection Methods for Malware Variant Using API Call Related Graphs," Proc. of the International Conference on IT Convergence and Security 2011 Lecture Notes in Electrical Engineering, Vol. 120, pp. 607-611, 2012.
  6. B. Kang, T. Kim, H. Kwon, Y. Choi, and E. G. Im, "Malware Classification Method via Binary Content Comparison," Proc. RACS '12 Proceedings of the 2012 ACM Research in Applied Computation Symposium, pp. 316-321, 2012.
  7. K. S. Han, B. Kang, and E. G. Im, "Malware Classification using Instruction Frequencies," Proc. RACS '11 Proceedings of the 2011 ACM Symposium on Research in Applied Computation, pp. 298-300, 2011.
  8. K. S. Han, B. Kang, H. Kwon, B. Li, and E. G. Im, "Virus Classification via Instruction Frequency," Proc. of the Fourth Joint Workshop between HYU and BUPT, Oct. 2011.
  9. K. S. Han, S. Kim, and E. G. Im, "Instruction Frequency-based Malware Classification Method," International Information Institute(Tokyo) Information, Vol. 15, No. 7, 2012.
  10. B. Kang, H. S. Kim, T. Kim, H. Kwon, and E. G. Im, "Fast Malware Classification using Counting Bloom Filter," INFORMATION-An International Interdisciplinary Journal, Vol. 15, No. 7, Jul. 2012.
  11. T. F. Smith and M. S. Waterman, "Identification of Common Molecular Subsequences," Journal of Molecular Biology, Vol. 147, No. 1, pp. 195-197, Mar. 1981. https://doi.org/10.1016/0022-2836(81)90087-5
  12. S. B. Needleman and C. D. Wunsch, "A general method applicable to the search for similarities in the amino acid sequence of two proteins," Journal of Molecular Biology, Vol. 48, No. 3, pp. 443-453, Mar. 1970. https://doi.org/10.1016/0022-2836(70)90057-4
  13. J, Cohen, "Bioinformatics-an introduction for computer scientists," ACM Computing Surveys (CSUR), Vol. 36, No. 2, pp. 122-158, Jun. 2004. https://doi.org/10.1145/1031120.1031122
  14. S. F. Altschul, W. Gish, W. Miller, E. W. Myers, D. J. Lipman, "Basic local alignment search tool," Journal of molecular biology, Vol. 215, No. 3, pp. 403-410, Oct. 1990. https://doi.org/10.1016/S0022-2836(05)80360-2
  15. Cuckoo Sandbox. [Online] Available: http://cuckoosandbox.org/