Browse > Article

Developing of Text Plagiarism Detection Model using Korean Corpus Data  

Ryu, Chang-Keon (부산대학교 컴퓨터공학과)
Kim, Hyong-Jun (부산대학교 컴퓨터공학과)
Cho, Hwan-Gue (부산대학교 컴퓨터공학과)
Abstract
Recently we witnessed a few scandals on plagiarism among academic paper and novels. Plagiarism on documents is getting worse more frequently. Although plagiarism on English had been studied so long time, we hardly find the systematic and complete studies on plagiarisms in Korean documents. Since the linguistic features of Korean are quite different from those of English, we cannot apply the English-based method to Korean documents directly. In this paper, we propose a new plagiarism detecting method for Korean, and we throughly tested our algorithm with one benchmark Korean text corpus. The proposed method is based on "k-mer" and "local alignment" which locates the region of plagiarized document pairs fast and accurately. Using a Korean corpus which contains more than 10 million words, we establish a probability model (or local alignment score (random similarity by chance). The experiment has shown that our system was quite successful to detect the plagiarized documents.
Keywords
Plagiarism detecting; Korean Text corpus; Information retrieval;
Citations & Related Records
연도 인용수 순위
  • Reference
1 David Gitchell and Nicholas Tran. Sim: a utility for detecting similarity in computer programs. In SIGCSE '99: The proceedings of the thirtieth SIGCSE technical symposium on Computer science education, 266-270, 1999
2 21세기 세종계획. http://www.sejong.or.kr/
3 CloneChecker: A Software Plagiarism Detector. http: //ropas.snu.ac.kr/n/clonechecker/
4 Donaldson, J. L., Lancaster, A., and Sposato, P. H. A plagiarism detection system. In Proceedings of the Twelfth SIGCSE Technical Symposium on Computer Science Education. 21-25, 1981
5 Cameron, M., Williams, H. E., and Cannane, A. Improved Gapped Alignment in BLAST. IEEE/ ACM Trans. Comput. Biol. Bioinformatics 1, 3, 116-129. Jul. 2004   DOI   ScienceOn
6 Wise. YAP3: Improved detection of similarities in computer program and other texts. SIGCSEB: SIGCSE Bulletin, 28, 1996
7 Schleimer, S., Wilkerson, D. S., and Aiken, A. Winnowing: local algorithms for document fingerprinting. In Proceedings of the 2003 ACM SIGMOD international Conference on Management of Data. 76-85. June 09-12, 2003
8 Turnitin. http://www.turnitin.com/
9 Ryu Chang-Keon, Kim Hyong-Jun, Park Soo- Hyun, and Cho Hwan-Gue. DEVAC(Document EVolution Analyzing Center). http://devac.cs.pusan.ac.kr: 8080/
10 Leslie, C. and Kuang, R. Fast String Kernels using Inexact Matching for Protein Sequences. J. Mach. Learn. Res. 5, 1435-1455. Dec. 2004
11 Geoff Whale. Plague: Plagiarism detection using program structure. Department of Computer Science, University of New South Wales, May 1988