Browse > Article

Generating Pylogenetic Tree of Homogeneous Source Code in a Plagiarism Detection System  

Ji, Jeong-Hoon (Graduate School of Computer Engineering, Pusan National University)
Park, Su-Hyun (Graduate School of Computer Engineering, Pusan National University)
Woo, Gyun (Graduate School of Computer Engineering, Pusan National University)
Cho, Hwan-Gue (Graduate School of Computer Engineering, Pusan National University)
Publication Information
International Journal of Control, Automation, and Systems / v.6, no.6, 2008 , pp. 809-817 More about this Journal
Abstract
Program plagiarism is widespread due to intelligent software and the global Internet environment. Consequently the detection of plagiarized source code and software is becoming important especially in academic field. Though numerous studies have been reported for detecting plagiarized pairs of codes, we cannot find any profound work on understanding the underlying mechanisms of plagiarism. In this paper, we study the evolutionary process of source codes regarding that the plagiarism procedure can be considered as evolutionary steps of source codes. The final goal of our paper is to reconstruct a tree depicting the evolution process in the source code. To this end, we extend the well-known bioinformatics approach, a local alignment approach, to detect a region of similar code with an adaptive scoring matrix. The asymmetric code similarity based on the local alignment can be considered as one of the main contribution of this paper. The phylogenetic tree or evolution tree of source codes can be reconstructed using this asymmetric measure. To show the effectiveness and efficiency of the phylogeny construction algorithm, we conducted experiments with more than 100 real source codes which were obtained from East-Asia ICPC(International Collegiate Programming Contest). Our experiments showed that the proposed algorithm is quite successful in reconstructing the evolutionary direction, which enables us to identify plagiarized codes more accurately and reliably. Also, the phylogeny construction algorithm is successfully implemented on top of the plagiarism detection system of an automatic program evaluation system.
Keywords
Asymmetric local alignment; evolution process; phylogeny of source codes; plagiarism detection; source code similarity;
Citations & Related Records

Times Cited By Web Of Science : 1  (Related Records In Web of Science)
Times Cited By SCOPUS : 2
연도 인용수 순위
1 N. Forbes, Imitation of Life: How Biology is Inspiring Computing, MIT Press, 2004
2 J.-H. Ji, G. Woo, S.-H. Park, and H.-G. Cho, "Understanding evolution process of program source for investigating software authorship and plagiarism," Proc. of the 2nd International Conference on Digital Information Management, pp. 98-103, October 2007
3 M. E. Karim, A. Walenstein, A. Lakhotia, and L. Parida, "Malware phylogeny generation using permutations of code," J. in Computer Virology, vol. 1, no. 1, pp. 13-23, 2005   DOI
4 C. F. Kemerer and S. Slaughter, "An empirical approach to studying software evolution," IEEE Trans. on Software Engineering, vol. 25, no. 4, pp. 493-509, 1999   DOI   ScienceOn
5 S. Brin, J. Davis, and H. Garcia-Molina, "Copy detection mechanisms for digital documents," Proc. of the ACM SIGMOD Annual Conference, pp. 398-409, 1995   DOI   ScienceOn
6 J.-W. Son, S.-B. Park, and S.-Y. Park, "Program plagiarism detection using parse tree kernels," Proc. of the 9th Pacific Rim International Conference on Artificial Intelligence, Lecture Notes in Computer Science, Springer, vol. 4099, pp. 1000-1004, Aug. 2006
7 D. Gitchell and N. Tran, "Sim: A utility for detecting similarity in computer programs," Proc. Of the Thirtieth SIGCSE Technical Symposium on Computer Science Education, pp. 266-270, ACM Press 1999   DOI
8 S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, "Basic local alignment search tool," J. Molecular Biology, vol. 215, pp. 403-410, 1990   DOI
9 S. D. Stephens, "Using metrics to detect plagiarism (student paper)," Proc. of the 7th Annual Consortium for Computing in Small Colleges, pp. 191-196, Consortium for Computing Sciences in Colleges, USA, 2001
10 J. H. Johnson, "Identifying redundancy in source code using fingerprints," Proc. of the Conference of the Centre for Advanced Studies on Collaborative Research, pp. 171-183, IBM Press, 1993
11 L. A. Goldberg, P. W. Goldberg, C. A. Phillips, and G. B. Sorkin, "Constructing computer virus phylogenies," J. of Algorithms, vol. 26, no. 1, pp. 188-208, January 1998   DOI   ScienceOn
12 K. L. Verco and M. J. Wise, "Software for detecting suspected plagiarism: Comparing structure and attribute-counting systems," Proc. of the 1st Australian Conference on Computer Science Education, Sydney, Australia, pp. 130-134, July 1996
13 I. D. Baxter, A. Yahin, L. M. D. Moura, M. Sant'Anna, and L. Bier, "Clone detection using abstract syntax trees," Proc. of the International Conference on Software Maintenance, pp. 368-377, 1998
14 J.-H. Ji, S.-H. Park, G. Woo, and H.-G. Cho, "Evolution analysis of homogenous source code and its application to plagiarism detection," Proc. of the FBIT2007, pp. 813-818, October 2007
15 M. J. Wise, "YAP3: Improved detection of similarities in computer program and other texts," Proc. of SIGCSE '96, pp. 130-134, 1996
16 J.-H. Ji, G. Woo, and H.-G. Cho, "A source code linearization technique for detecting plagiarized programs," ACM SIGCSE Bulletin, vol. 39, no. 3, pp. 73-77, June 2007   DOI
17 A. Aiken, Moss (Measure of Software Mimilarity) Plagiarism Detection System, Available: http://theory.stanford.edu/~aiken/moss/, 1998
18 J.-H. Ji, G. Woo, S.-H. Park, and H.-G. Cho, "An intelligent system for detecting source code plagiarism using a probabilistic graph model," Proc. of the 5th International Conference on Machine Learning and Data Mining in Pattern Recognition, MLDM Posters 2007, pp. 55-69, July. 2007
19 S. Meyer zu Eissen and B. Stein, "Intrinsic plagiarism detection," Proc. of ECIR 2006, Lecture Notes in Computer Science, vol. 3936, pp. 565-569, 2006
20 L. Prechelt, G. Malpohl, and M. Philippsen, "Finding plagiarisms among a set of programs with JPlag," J. of Universal Computer Science, vol. 8, no. 11, pp. 1016-1038, 2002