Browse > Article
http://dx.doi.org/10.3745/KTCCS.2014.3.6.189

A Plagiarism Detection Technique for Source Codes Considering Data Structures  

Lee, Kihwa (슈어소프트테크)
Kim, Yeoneo (부산대학교 전자전기컴퓨터공학과)
Woo, Gyun (부산대학교 전자전기컴퓨터공학과, LG전자 스마트제어센터 행정관리부)
Publication Information
KIPS Transactions on Computer and Communication Systems / v.3, no.6, 2014 , pp. 189-196 More about this Journal
Abstract
Though the plagiarism is illegal and should be avoided, it still occurs frequently. Particularly, the plagiarism of source codes is more frequently committed than others since it is much easier to copy them because of their digital nature. To prevent code plagiarism, there have been reported a variety of studies. However, previous studies for plagiarism detection techniques on source codes do not consider the data structures although a source code consists both of data structures and algorithms. In this paper, a plagiarism detection technique for source codes considering data structures is proposed. Specifically, the data structures of two source codes are represented as sets of trees and compared with each other using Hungarian Method. To show the usefulness of this technique, an experiment has been performed on 126 source codes submitted as homework results in an object-oriented programming course. When both the data structures and the algorithms of the source codes are considered, the precision and the F-measure score are improved 22.6% and 19.3%, respectively, than those of the case where only the algorithms are considered.
Keywords
Program Plagiarism Detection; Static Analysis; Program Similarity; Similarity on Data Structures;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Temple F. Smith and Michael S. Waterman, "Identification of common molecular subsequences," Journal of Molecular Biology, Vol.147, No.1, pp.195-197, 1981.   DOI
2 Raimar Falke, Pierre Frenzel, and Rainer Koschke, "Empirical evaluation of clone detection using syntax suffix trees," Empirical Software Engineering, Vol.13, No.6, pp.601-643, 2008.   DOI
3 Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, and Stephane Glondu, "Deckard: Scalable and accurate treebased detection of code clones," In Proceedings of the 29th international conference on Software Engineering, pp.96-105, IEEE, 2007.
4 Ira D. Baxter, Andrew Yahin, Leonardo Moura, Marcelo Sant'Anna, and Lorraine Bier, "Clone detection using abstract syntax trees," In International Conference on Software Maintenance, pp.368-377, IEEE, 1998.
5 jgesser, javaparser - Java 1.5 parser and AST [Internet], http://code.google.com/p/javaparser
6 Karl J. Ottenstein, "An algorithmic approach to the detection and prevention of plagiarism," ACM SIGCSE Bulletin, Vol.8, No.4, pp.30-44, 1976.
7 Yun-Jung Lee, Jin-Su Lim, Jeong-Hoon Ji, Hwan-Gue Cho, and Gyun Woo, "Plagiarism detection among source codes using adaptive methods," Transactions on Internet and Information Systems, Vol.6, No.6, pp.1627-1648, 2012.   과학기술학회마을   DOI
8 Jeong-Hoon Ji, Gyun Woo and Hwan-Gue Cho, "A source code linearization technique for detecting plagiarized programs," In Proceedings of the 12th annual SIGCSE conference on Innovation and technology in computer science education, pp.73-77, 2007.
9 Niklaus Wirth, Algorithms + Data Structures = Programs, Prentice Hall, 1976.
10 Maurice H. Halstead, "Elements of Software Science (Operating and programming systems series)," Elsevier Science Inc., 1977.
11 Hal L. Berghel and David L. Sallach, "Measurements of program similarity in identical task environments," ACM SIGPLAN Notices, Vol.19, No.8, pp.65-76, 1984.
12 Sam Grier, "A tool that detects plagiarism in pascal programs," In ACM SIGCSE Bulletin, Vol.13, pp.15-20, 1981.
13 Stephane Ducasse, Oscar Nierstrasz, and Matthias Rieger, "On the effectiveness of clone detection by string matching," Journal of Software Maintenance and Evolution: Research and Practice, Vol.18, No.1, pp.37-58, 2006.   DOI   ScienceOn
14 J. Howard Johnson, "Identifying redundancy in source code using fingerprints," In Proceedings of the 1993 Conference of the Centre for Advanced Studies on Collaborative Research: Software Engineering-Volume 1, pp.171-183, IBM Press, 1993.
15 Jin-Su Lim, A code plagiarism detection system considering the coding style, Master thesis, Pusan National University, 2012.
16 David gitchell and Nicholas Tran, "Sim: a utility for detecting similarity in computer programs," In ACM SIGCSE Bulletin, Vol.31, pp.266-270, ACM, 1999.
17 Michael J. Wise, "Yap3: Improved detection of similarities in computer program and other texts," In ACM SIGCSE Bulletin, Vol.28, pp.130-134, ACM, 1996.
18 Michel Chilowicz, Etienne Duris, and Gilles Roussel, "Syntax tree fingerprinting for source code similarity detection," In 17th IEEE International Conference on Program Compre hension, pp.243-247, IEEE, 2009.
19 Chanchal K. Roy, James R. Cordy, and Rainer Koschke, "Comparison and evaluation of code clone detection techniques and tools: A qualitative approach," Science of Computer Programming, Vol.74, No.7, pp.470-495, 2009.   DOI   ScienceOn
20 Stefan Bellon, Rainer Koschke, Giuliano Antoniol, Jens Krinke, and Ettore Merlo, "Comparison and evaluation of clone detection tools," IEEE Transactions on Software Engineering, Vol.33, No.9, pp.577-591, 2007.   DOI   ScienceOn
21 Lutz Prechelt, Guido Malpohl, and Michael Philippsen, "Finding plagiarisms among a set of program with JPlag," Journal of Universal Computer Science, Vol.8, No.11, pp.1016-1038, 2002.
22 Michael J. Wise, "Neweyes: a system for comparing biological sequences using the running karp-rabin greedy string-tiling algorithm," In Intelligent Systems in Molecular Biology, pp.393-401, 1995.
23 Jeong-Hoon Ji, Program similarity analysis framework using adaptive sequence alignment technique, PhD thesis, Pusan National University, 2010.
24 Jeong-Hoon Ji, Gyun Woo, Sang-Hyun Park, and Hwan-Gue Cho, "An intelligent system for detecting source code plagiarism using a probabilistic graph model," In Machine Learning and Data Mining in Pattern Recognitions Posters, pp.55-69, 2007.
25 Harold W. Kuhn, "Variants of the hungarian method for assignment problems," Noval Research Logistics Quarterly, Vol.3, No.4, pp.253-258, 1956.   DOI