DOI QR코드

DOI QR Code

Automatic identification of Java Method Naming Patterns Using Cascade K-Medoids

  • Kim, Tae-young (Dept. of Software Engineering, CAIIT, Chonbuk National University) ;
  • Kim, Suntae (Dept. of Software Engineering, CAIIT, Chonbuk National University) ;
  • Kim, Jeong-Ah (Department of Computer Education, Catholic Kwandong University) ;
  • Choi, Jae-Young (College of Information and Communication Engineering, SungKyunKwan University) ;
  • Lee, Jee-Huong (College of Information and Communication Engineering, SungKyunKwan University) ;
  • Cho, Youngwha (College of Information and Communication Engineering, SungKyunKwan University) ;
  • Nam, Young-Kwang (Department of Computer and Telecommunications, Yonsei University)
  • Received : 2017.09.29
  • Accepted : 2018.02.13
  • Published : 2018.02.28

Abstract

This paper suggests an automatic approach to extracting Java method implementation patterns associated with method identifiers using Cascade K-Medoids. Java method implementation patterns indicate recurring implementations for achieving the purpose described in the method identifier with the given parameters and return type. If the implementation is different from the purpose, readers of the code tend to take more time to comprehend the method, which eventually affects to the increment of software maintenance cost. In order to automatically identify implementation patterns and its representative sample code, we first propose three groups of feature vectors for characterizing the Java method signature, method body and their relation. Then, we apply Cascade K-Medoids by enhancing the K-Medoids algorithm with the Calinski and Harrabasez algorithm. As the evaluation of our approach, we identified 16,768 implementation patterns of 7,169 method identifiers from 50 open source projects. The implementation patterns have been validated by the 30 industrial practitioners with from 1 to 6 years industrial experience, resulting in 86% of the precision.

Keywords

References

  1. K. Beck, "Implementation Patterns 1st Edition," Addison-Wesley Professional, 2007.
  2. Thomas M. Pigoski, "Practical Software Maintenance: Best Practices for Managing Your Software Investment," Wiley Publishing, 1st edition, 1996.
  3. Joseph (Yossi) Gil and Itay Maman, "Micro patterns in java code," in Proc. of Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA '05, pages 97-116, New York, NY, USA, 2005. ACM.
  4. Rakesh Agrawal, Tomasz Imielinski, and Arun Swami, "Mining association rules between sets of items in large databases." SIGMOD Rec., vol. 22, no. 2, 207-216, June 1993. https://doi.org/10.1145/170036.170072
  5. Takayuki Suzuki, Kazunori Sakamoto, Fuyuki Ishikawa, and Shinichi Honiden, "An approach for evaluating and suggesting method names using n-gram models," in Proc. of Proceedings of the 22Nd International Conference on Program Comprehension, ICPC 2014, pages 271-274, New York, NY, USA, 2014. ACM.
  6. C. E. Shannon. "A mathematical theory of communication," SIGMOBILE Mob. Comput. Commun. Rev., vol. 5, no. 1, 3-55, January 2001.
  7. F. Deissenbock and M. Pizka, "Concise and consistent naming," in Proc. of Proceedings of the International Workshop on Program Comprehension (IWPC'05), pages 97-106. IEEE CS Press, 2005.
  8. D. Lawrie, H. Feild, and D. Binkley, "Syntactic identifier conciseness and consistency," in Proc. of Sixth IEEE International Workshop on Source Code Analysis and Manipulation (SCAM 2006), 27-29 September 2006, Philadelphia, Pennsylvania, USA, pages 139-148, 2006.
  9. George A. Miller. "Wordnet: A lexical database for English," Commun. ACM, vol. 38, no. 11, 39-41, November 1995. https://doi.org/10.1145/219717.219748
  10. Eclipse. Eclipse Class ASTParser.
  11. Oracle. Code Conventions for the Java Programming Language: Why Have Code Conventions SunMicrosystems.
  12. S. Kim and D. Kim, "Automatic identifier inconsistency detection using code dictionary," Empirical Software Engineering, vol. 21, no.v2, 565-604, 2016. https://doi.org/10.1007/s10664-015-9369-5
  13. Stanford. The Stanford Parser: A statistical parser Homepage.
  14. S. Kim, T. Kim, I. Lee, J.A Kim, and Y. Cho, "Feature vectors for recognizing java method naming patterns," in Proc. of Asia Pacific International Conference on Information Science and Technology, pages 320-322. IEEE, 2017.
  15. Hae-Sang Park and Chi-Hyuck Jun, "A simple and fast algorithm for k-medoids clustering," Expert Systems with Applications, vol. 36, no. 2, Part 2, 3336 - 3341, 2009. https://doi.org/10.1016/j.eswa.2008.01.039
  16. J. A. Hartigan and M. A. Wong, "Algorithm as 136: A k-means clustering algorithm," Journal of the Royal Statistical Society. Series C (Applied Statistics), vol. 28, no. 1, 100-108, 1979.
  17. T. Calinski and J. Harabasz, "A dendrite method for cluster analysis," Communications in statistics, vol. 3, no. 1, 1-27, 1974.
  18. William B. Frakes and Ricardo Baeza-Yates, "Information Retrieval: Data Structures and Algorithms," PrenticeHall, Inc., 1st edition, 1992.
  19. I.H. Witten, E. Frank, and M.A. Hall, "Data Mining: Practical Machine Learning Tools and Techniques, Third Edition(Morgan Kaufmann Series in Data Management Systems)," Morgan Kaufmann, 2011.