Browse > Article
http://dx.doi.org/10.13089/JKIISC.2020.30.6.1131

A Study on the Improvement of Source Code Static Analysis Using Machine Learning  

Park, Yang-Hwan (Graduate School of Information Security, Korea University)
Choi, Jin-Young (Graduate School of Information Security, Korea University)
Abstract
The static analysis of the source code is to find the remaining security weaknesses for a wide range of source codes. The static analysis tool is used to check the result, and the static analysis expert performs spying and false detection analysis on the result. In this process, the amount of analysis is large and the rate of false positives is high, so a lot of time and effort is required, and a method of efficient analysis is required. In addition, it is rare for experts to analyze only the source code of the line where the defect occurred when performing positive/false detection analysis. Depending on the type of defect, the surrounding source code is analyzed together and the final analysis result is delivered. In order to solve the difficulty of experts discriminating positive and false positives using these static analysis tools, this paper proposes a method of determining whether or not the security weakness found by the static analysis tools is a spy detection through artificial intelligence rather than an expert. In addition, the optimal size was confirmed through an experiment to see how the size of the training data (source code around the defects) used for such machine learning affects the performance. This result is expected to help the static analysis expert's job of classifying positive and false positives after static analysis.
Keywords
Static analysis; Secure Coding; Deep Learning; Convolutional Neural Networks(CNN);
Citations & Related Records
연도 인용수 순위
  • Reference
1 Hongjun Choi, Jinyoung Choi, Software Development Life Cycle and Static Analysis Tool, 818-819, 2019
2 Software security weakness diagnosis guide for e-government software development security inspectors, pp. 26-27, 2019
3 Martin White, Michele Tufano et al, deep learning code fragments for code clone detection, 2016
4 Hoa Khanh Dam, Truyen Tran, Trang Pham, A deep language model for software code, 2016
5 Xi Victoria Lin, Chenglong Wang, et al, Program Synthesis from Natural Language Using Recurrent Neural Networks, 2017
6 Rebecca L. Russell, Louis Kim, et al, Automated Vulnerability Detection in Source Code Using Deep Representation Learning, 2018
7 Won-kyung Lee, Min-Ju Lee, Dongsu Seo, Application of Machine Learning Techniques for the Classification of Source Code Vulnerability, pp. 6-7 2020
8 Youngho Lee, Seong-Yun Hong, A machin learning approach to the prediction of indifidual travel mode choices, 1011-1024, 2019
9 Jiho Bang, Rhan Ha, Evaluation Methodology of Diagnostic Tool for Security Weakness of e-GOV Software, pp. 336 The Korean Institute of Communications and Information Sciences 2013-04 Vol.38C No.04
10 U.S. Department of Homeland Security (DHS), Software Assurance, https://us-cert.cisa.gov/sites/default/files/publications/infosheet_SoftwareAssurance.pdf
11 Google Plus Will Be Shut Down After User Information Was Exposed, https://www.nytimes.com/
12 Coding Error Sends 2019 Subaru Ascents to the Car Crusher, https://spectrum.ieee.org