Browse > Article

Applying Genomic Sequence Alignment Methodology for Source Codes Plagiarism Detection  

강은미 (부산대학교 전자계산학과)
황미녕 (한국과학기술정보연구원)
조환규 (부산대학교 정보컴퓨터공학부)
Abstract
The syntactic and semantic characteristics of a computer program can be represented by the keywords sequence extracted from the source code. Therefore the similarity and the difference between two programs can be clearly figured out by comparing the keyword sequences obtained from the given programs. Various methods for measuring the similarity of two different sequences have been intensively studied already in bioinformatics on biological genetic sequence manipulation. In this paper, we propose a new method for measuring the similarity of two different programs and detecting the partial plagiarism by exploiting the sequence alignment techniques. In order to evaluate the performance of the proposed method, we experimented with the actual Program codes submitted by 70 students attending a Data Structure course )tow 2001. The experimental results show that the proposed method is more effective and powerful than the fingerprint method which is the most commonly used for the Plagiarism detection.
Keywords
sequence alignment; plagiarism detection;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 http://www.integriguard.com
2 http://www.canexus.com/eve/abouteve.shtml
3 http://www.copycatch.freeserve.co.uk
4 http://www.wordcheksystems.com/
5 http://www.calstatela.edu/centers/write_cn/plagiarism.htm
6 http://www.rbsz.com/plag.htm
7 Sergey B, James D. and H.G, 'Copy detection mechanisms for digital documents,' Proc. ACM SIGMOD International conference on Management of data, pp. 398-409, 1995   DOI
8 Alan P. and James O.H., 'Computer algorithms for Plagiarism Detection,' IEEE Transactions on Education, Vol.32, No.2, pp. 94-99, 1989   DOI   ScienceOn
9 http://www.plagiarism.org
10 http://www.few.vu.nl/~dick/sim.html
11 http://glimpse.arizona.edu/javadup.html
12 Whale, 'Identification of Program Similarity in Large populations,' The Computer Journal, Vol.33, No.2, pp. 140-146, 1990   DOI
13 http://www2.ebi.ac.uk/clustalw/
14 Michael. J.W., 'YAP3: improved detection of similarities in computer programs and other texts,' Proc. SIGCSE'96, pp. 130-134, 1996   DOI
15 Antonio. S., Hong V.L., and Rynson. W.H.L., 'CHECK: A document plagiarism detection system,' Proc. ACM Symposium on Applied Computing, pp. 70-77, 1997   DOI
16 Michael. J.W., 'Detection of similarities in student programs: YAP'ing may be preferable to Plague'ing,' Proc. SIGSCI Technical Symposium, pp. 268-271, 1992
17 http://wwwipd.ira.uka.de:2222/
18 이광근 교수와의 서신, private communication
19 http://ftp.cs.berkeley.edu/~aiken/moss.html
20 조환규, 'Genomic Sequence alignment and its application for Computing Linear Structure Similarity,' 2002년 제 1차 한국생물정보학회 워크샵, 2. 2002
21 Julie D.T.,Desmond G.H., and Toby. J.G., 'CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice,' Nucleic Acids Res. Vol,22, No.22, pp. 4673-4680, 1994   DOI
22 Tak W.Y. and Hector. G., 'Duplicate detection in information dissemination,' Proc. Very Large Databases Conference, pp. 66-77, 1995
23 Jeong-Hyeon C., Ho-Youl J.. Hey-Sun K. and Hwan-Gue C., 'PhyloDraw: a phylogenetic tree drawing system,' Bioinformatics, Vol.16, No.11 , pp. 1056-1058, 2000   DOI
24 http://www.gyosuclub.com/