DOI QR코드

DOI QR Code

Detecting Inconsistent Code Identifiers

코드 비 일관적 식별자 검출 기법

  • Received : 2013.01.04
  • Accepted : 2013.01.15
  • Published : 2013.05.30

Abstract

Software maintainers try to comprehend software source code by intensively using source code identifiers. Thus, use of inconsistent identifiers throughout entire source code causes to increase cost of software maintenance. Although participants can adopt peer reviews to handle this problem, it might be impossible to go through entire source code if the volume of code is huge. This paper introduces an approach to automatically detecting inconsistent identifiers of Java source code. This approach consists of tokenizing and POS tagging all identifiers in the source code, classifying syntactic and semantic similar terms, and finally detecting inconsistent identifiers by applying proposed rules. In addition, we have developed tool support, named CodeAmigo, to support the proposed approach. We applied it to two popular Java based open source projects in order to show feasibility of the approach by computing precision.

소프트웨어 유지 보수 담당자는 코드 식별자를 중심으로 소프트웨어의 소스 코드를 이해한다. 그렇기 때문에 코드의 식별자를 비 일관적으로 사용하게 되면 소프트웨어를 이해하는데 어려움을 격게 되어 결국 소프트웨어의 유지보수 비용이 증가하게 된다. 이러한 비 일관적인 식별자 사용의 문제를 해결하기 위하여 개발자가 상호 검토하는 방법이 있으나 코드의 양이 많은 경우에 전체 코드를 확인하는 것은 불가능할 수 있다. 본 논문에서는 자연어 처리 기법을 사용하여 자동으로 Java 코드 내의 비 일관적인 식별자를 검출하기 위한 기법을 소개한다. 이 기법에서는 프로젝트 내의 모든 식별자를 추출 및 구문 분석하고, 구조상 유사어와 의미상 유사어를 분류한 후 최종적으로 제안된 규칙을 기반으로 비 일관적인 식별자를 검출한다. 본 논문에서는 지원 도구인 CodeAmigo를 개발하여 제안된 방법을 지원하였다. 우리는 지원 도구를 두 가지의 널리 알려진 Java기반 오픈 소스 프로젝트에 적용하고, 검출 결과의 정확도를 계산하여 제안된 접근 방법의 타당성을 확인하였다.

Keywords

References

  1. N. Madani, L. Guerroju, M.D. Penta, Y. Gueheneuc and G. Antoniol, "Recognizing Words from Source Code Identifiers using Speech Recognition Techniques", In Proceedings of 14th European Conference on Software Maintenance and Reengineering(CSMR), Madrid, Spain, pp.68-77, 2010.
  2. F. Deibenbock and M. Pizka, "Concise and Consistent Naming", In Proceedings of International Workshop on Program Comprehension 2005(IWPC 2005), St. Louis, MO, USA, pp.261-282, 2005.
  3. D. Lawrie, H. Field and D. Binkley, "Syntactic Identifier Conciseness and Consistency", In Proceedings of Sixth IEEE International Workshop on Source Code Analysis and Manipulation(SCAM2006), Philadelphia, Pennsylvania, USA, pp.139-148, Sept., 2006.
  4. S.F. Abebe, S. Haiduc, P. Tonella and A. Marcus, "Lexicon Bad Smells in Software", In Proceedings of 16th Working Conference on Reverse Engineering, Antwerp Belgium, pp.95-99, Oct., 2008.
  5. S.L. Abebe and P. Tonella, "Natural Language Parsing of Program Element Names for Concept Extraction", In Proceedings of 18th International Conference on Program Comprehension (ICPC 2010), Braga, Minho, Portugal, pp.156-159, July, 2010.
  6. J. Falleri, M. Lafourcade, C. Nebut, V. Prince and M. Dao, "Automatic Extraction of a WordNet-like Identifier Network from Software", In Proceedings of 18th International Conference on Program Comprehension (ICPC 2010), Braga, Minho, Portugal, pp.4-13, July, 2010.
  7. WordNet: A lexical database for English, Home page (2012), [Internet] http://wordnet.princeton.edu/
  8. D. Klein and C.D. Manning, "Accurate Unlexicalized Parsing", In Proceedings of the 41st Meeting of the Association for Computational Linguistics, Sapporo, Japan, pp.423-430, 2003.
  9. V.I Levenshtein, "Binary codes capable of correcting deletions, insertions and reversals", Soviet Physics Doklady, Vol.10, No.8, pp.707-710, 1966.
  10. M. Fowler, "Refactoring: Improving the Design of Existing Code". Addison-Wesley, 1999.
  11. "Code Conventions for the Java Programming Language: Why Have Code Conventions", Sunmicro Systems (1999), [Internet]http://www.oracle.com/technetwork/java/index-135089.html
  12. G. Antoniol, G. Canfora, G. Casazza, A.D. Lucia and E. Merlo, "Recovering Traceability links between code and documentation.", IEEE Transactions on Software Engineering, Vol.28, No.10, pp.970-983, October, 2012.
  13. B. Caprile and P. Tonella, "Nomen Est Omen: Analyzing the Language of Function Identifiers", In Proceedings of Sixth Working Conference on Reverse Engineering, Altanta, Georgia, pp.112-122, 1999.
  14. The Stanford Parser Home page, 2012, [Internet] http://nlp.stanford.edu/software/lex-parser.shtml
  15. JAWS(Java API for WordNet Searching) Homepage, 2012, [Internet] http://lyle.smu.edu/-tspell/jaws/index.html
  16. JWI(The MIT Java WordNet Interface) Homepage, 2012, [Internet] http://projects.csail.mit.edu/jwi/
  17. HyperSQL Homepage, 2012, [Internet] http://www.hsqldb.org/
  18. Apache Lucene Homegage, 2012. [Internet] http://lucene.apache.org/core/
  19. Apache Ant Homepage, 2012, [Internet] http://ant.apache.org/
  20. W.B. Frakes and R. Baeza-Yates, "Information Retrival : Data Structures and Algorithms." Englewood Cliffs, J.J.: Prentice-Hall, 1992.
  21. J. Bloch, "Effective Java 2nd Edition", Addison-Wesley, 2008.
  22. B. Caprile and P. Tonella. "Restructuring program identifier names". In Proceedings of 16th International Conference on Software Maintenance(ICSM 2000), San Jose, California USA, pp.97-107, Oct., 2000.
  23. E. Host and B. Ostvold, "The Programmer's Lexicon, Volumn I: The Verbs", In Proceedings of Seventh IEEE International Working Conference on Source Code Analysis and Manipulation(SCAM2007), Paris France, pp.193-202, 2007.