Browse > Article

Program Plagiarism Detection based on X-treeDiff+  

Lee, Suk-Kyoon (Department of Computer Science and Engineering, College of Engineering, Dankook University)
Publication Information
Abstract
Program plagiarism is a significant factor to reduce the quality of education in computer programming. In this paper, we propose the technique of identifying similar or identical programs in order to prevent students from reckless copying their programming assignments. Existing approaches for identifying similar programs are mainly based on fingerprints or pattern matching for text documents. Different from those existing approaches, we propose an approach based on the program structur. Using paring progrmas, we first transform programs into XML documents by representing syntactic components in the programs with elements in XML document, then run X-tree Diff+, which is the change detection algorithm for XML documents, and produce an edit script as a change. The decision of similar or identical programs is made on the analysis of edit scripts in terms of program plagiarism. Analysis of edit scripts allows users to understand the process of conversion between two programs so that users can make qualitative judgement considering the characteristics of program assignment and the degree of plagiarism.
Keywords
X-tree Diff+; XML;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 G. Cobena, S. Abiteboul and A. Marian, "Detecting Changes in XML Documents," the 18th ICDE, 2002.
2 이석균, 김동아, "X-tree Diff: 트리 기반 데이터를 위한 효율적인 변화 탐지 알고리즘," 정보처리학회논문지 10-C권 6호 pp683-694 2003.   과학기술학회마을
3 S.K. Lee, D.A Kim, "X-Tree Diff+:Efficient Change Detection Algorithm in XML Documents," Lecture Notes in Computer Science(LNCS4096), Springer Verlag, pp1037- 1046, 2006.
4 PCCTS, http://www.antlr2.org/pccts133.html.
5 JPlag, https://www.ipd.uni-karlsruhe.de/jplag/
6 D. Saccol, N. Edelweiss, R. Galante, C. Zaniolo, "XML Version Detection," DocEng'07, Aug. 2007.
7 김영철, 곽동규, 문현주, 최종명, 유재우, "프로그램 문장 결합 및 제어 구조를 이용한 유사도 평가," 정보과학회논문지: 기술교육 제 2권 제 1호, 2005.
8 한소정, "오픈 소스코드 표절 탐지 기법," 이화여자대학교 대학원 컴퓨터정보통신공학과 석사학위논문, 2009.
9 S.K. Lee, D.A Kim, "Efficient Change Detection In Tree-Structured Data," Lecture Notes in Computer Science (LNCS2713) pp675-681, 2003.
10 A. Haake, "CoVer: A Contextual Version Server for Hypertext Applications," In Proc. of 4th ACM Conf., Hypertext, pp.43-52, Milan. Italy, Nov. 1992.
11 K. Osterbye, "Structural and Cognitive Problems in Providing Version Control for Hypertext," In Proc. of 4th ACM Conf., Hypertext, pp33-42, Milan. Italy, Nov. 1992.
12 W. Labio and H. G. Molina, "Efficient snapshot differential algorithms for data warehousing," In Proc. of 20th Conf. VLDB, pp.63-74, Bombay. India, Sep. 1996.
13 J. Widom and S. Ceri, Active Database System: Triggers and Rules for Advanced Database Processing, Morgan Kaufmann, 1996.
14 E. W. Myers, "An O(ND) Difference Algorithm and Its Variations," Algorithmica, 1(2), pp.251-266, 1986.   DOI
15 "Concurrent Versions System(CVS)," Free Software Foundation, http://www.gnu.org/manual/cvs-1.9.
16 S. Chawathe, A. Rajaraman, H. G. Molina and J. Widom, "Change Detection in Hierarchically Structured Information," In Proc. of ACM SIGMOD Int'l Conf. on Management of Data, Montreal, June 1996.
17 S. M. Selkow, "The tree-to-tree editing problem," Information Proc. Letters, 6, pp.184-186, 1977.   DOI   ScienceOn
18 J. T. Wang and K. Zhang, "A System for Approximate Tree Matching," IEEE TKDE, 6(4), pp.559-571, August 1994.
19 K. Tai, "The tree-to-tree correction problem," Journal of the ACM, 26(3), pp.422-433, July 1979.   DOI   ScienceOn
20 S. Lu, "A tree-to-tree distance and its application to cluster analysis," IEEE TPAMI, 1(2), pp.219-224, 1979.
21 S. Chawathe and H. G. Molina. "Meaningful Change Detection in Structured Data," In Proc. of ACM SIGMOD '97, pp.26-37, 1997.
22 S. Schleimer, D. Wilkerson, A. Aiken, "Winnowing: Local Algorithms for Document Fingerprinting," SIGMOD'03, June 2003.
23 X. Chen, B. Francia, M. Li, B. Mckinnon, A. Sker, "Shared Information and Program Plagiarism Detection," IEEE Transactions on Information Theory, v. 50 n.7, 1545-1551, 2004.   DOI   ScienceOn