DOI QR코드

DOI QR Code

소스코드 주제를 이용한 인공신경망 기반 경고 분류 방법

Warning Classification Method Based On Artificial Neural Network Using Topics of Source Code

  • 이정빈 (고려대학교 시간생물학연구소)
  • 투고 : 2020.09.16
  • 심사 : 2020.11.04
  • 발행 : 2020.11.30

초록

자동화된 정적분석 도구는 소스 코드상에 잠재된 결함을 개발자들이 적은 노력으로 빠르게 찾을 수 있도록 도와준다. 하지만 이러한 정적분석 도구는 수정할 필요가 없는 오탐지 경고들을 무수하게 발생시킨다. 본 연구에서는 소스코드 블록의 토픽 모델을 이용한 인공신경망 기반의 경고 분류 기법을 제안한다. 소프트웨어 변경 관리 시스템으로부터 버그를 수정한 리비전들을 수집하고, 개발자들로부터 수정된 코드 블록들을 추출한다. 토픽 모델링을 이용하여 수집된 코드 블록의 토픽 분포 값을 구하고, 코드 블록의 리비전 간 경고들의 삭제 여부를 표현하는 이진데이터를 인공신경망의 입력 값과 출력 값으로 사용하여 심층 학습을 수행한다. 그 결과, 인공신경망 기반의 분류 모델이 높은 예측 성능으로 진성 또는 오탐지 경고를 분류하였다.

Automatic Static Analysis Tools help developers to quickly find potential defects in source code with less effort. However, the tools reports a large number of false positive warnings which do not have to fix. In our study, we proposed an artificial neural network-based warning classification method using topic models of source code blocks. We collect revisions for fixing bugs from software change management (SCM) system and extract code blocks modified by developers. In deep learning stage, topic distribution values of the code blocks and the binary data that present the warning removal in the blocks are used as input and target data in an simple artificial neural network, respectively. In our experimental results, our warning classification model based on neural network shows very high performance to predict label of warnings such as true or false positive.

키워드

참고문헌

  1. U. Yuksel and H. Sozer, "Automated classification of static code analysis alerts: A case study," in 2013 IEEE International Conference on Software Maintenance, pp.532-535, 2013.
  2. L. M. R. Velicheti, D. C. Feiock, M. Peiris, R. Raje, and J. H. Hill, "Towards modeling the behavior of static code analysis tools," in Proceedings of the 9th Annual Cyber and Information Security Research Conference, pp.17-20, 2014.
  3. Z. P. Reynolds, A. B. Jayanth, U. Koc, A. A. Porter, R. Raje, and J. H. Hill, "Identifying and documenting false positive patterns generated by static code analysis tools," in 2017 IEEE/ACM 4th International Workshop on Software Engineering Research and Industrial Practice, pp.55-61, 2017.
  4. M. Beller, R. Bholanath, S. McIntosh, and A. Zaidman, "Analyzing the state of static analysis: A large-scale evaluation in open source software," in 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering, pp.470-481, 2016.
  5. S. Mani, A. Sankaran, and R. Aralikatte, "Deeptriage: Exploring the effectiveness of deep learning for bug triaging," in Proceedings of the ACM India Joint International Conference on Data Science and Management of Data, pp.171-179, 2019.
  6. A. Yadav, S. K. Singh, and J. S. Suri, "Ranking of software developers based on expertise score for bug triaging," Information and Software Technology, Vol.112, pp.1-17, 2019. https://doi.org/10.1016/j.infsof.2019.03.014
  7. A. Goyal and N. Sardana, "Analytical Study on Bug Triaging Practices," In Cognitive Analytics: Concepts, Methodologies, Tools, and Applications, pp.1698-1725, 2020.
  8. Q. Hanam, L. Tan, R. Holmes, and P. Lam, "Finding patterns in static analysis alerts: improving actionable alert ranking," in Proc. the 11th Working Conference on Mining Software Repositories, ACM, pp.152-161, 2014.
  9. J. Wang, S. Wang, and Q. Wang, "Is there a golden feature set for static warning identification?: an experimental evaluation," in Proc. the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, pp.1-10, 2018.
  10. S. Arai, K. Sakamoto, H. Washizaki, and Y. Fukazawa, "A gamified tool for motivating developers to remove warnings of bug pattern tools," in 2014 6th International Workshop on Empirical Software Engineering in Practice. pp.37-42, 2014.
  11. K. Liu, D. Kim, T. F. Bissyande S. Yoo, and Y. Le Traon, "Mining fix patterns for findbugs violations," IEEE Transactions on Software Engineering, 2018. https://doi.org/10.1109/tse.2001.908956
  12. Paul J. Werbos, "The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting," New York: John Wiley & Sons, 1994.
  13. G. Ian, B. Yoshua, and C. Aaron, "6.2.2.3 Softmax Units for Multinoulli Output Distributions," in Deep Learning, MIT Press, pp.180-184, 2016.
  14. S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, "Indexing by latent semantic analysis," Journal of the Association for Information Science and Technology, Vol.41, No.6, pp.391-407, 1990.
  15. T. Hofmann, "Probabilistic latent semantic indexing," in Proc. the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.50-57, 1999.
  16. D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent Dirichlet allocation," Journal of Machine Learning Research, Vol.3, pp.993-1022, 2003.
  17. J. Chang, S. Gerrish, C. Wang, J. L. Boyd-Graber, and D. M. Blei, "Reading tea leaves: How humans interpret topic models," in Proc. the 23rd Advances in Neural Information Processing Systems, pp.288-296, 2009.
  18. N. Singh, S. R. Mohanty, and R. D. Shukla, "Short term electricity price forecast based on environmentally adapted generalized neuron," Energy, Vo.125, pp.127-139, 2017. https://doi.org/10.1016/j.energy.2017.02.094
  19. S. Ruder, "An overview of gradient descent optimization algorithms," arXiv preprint arXiv:1609.04747v2, 2016.