DOI QR코드

DOI QR Code

패턴매칭을 이용한 유사도 비교 분석

A Similarity Valuating System using The Pattern Matching

  • 고방원 (숭실대학교 정보과학대학 컴퓨터학부) ;
  • 김영철 (유한대학 전자상거래과)
  • 투고 : 2009.05.28
  • 심사 : 2010.01.26
  • 발행 : 2010.01.31

초록

본 논문에서는 서로 다른 두 개의 문서에 등장하는 패턴 매칭을 이용하여 유사도를 평가하는 시스템을 제안한다. 기존의 문서들의 유사도를 평가하는 방법에는 지문법과 같은 통계적 방법을 주로 이용하였다. 하지만 이 방법은 관련이 없는 두 문서에서 우연히 유사한 단어가 많이 등장 할 때 유사성이 높게 나오는 정확성의 문제점이 있다. 이러한 문제점은 단순히 두 문서의 통계적인 수치를 비교하기 때문에 발생한다. 하지만 본 논문에서 제시하는 패턴을 이용한 방법은 일치하는 패턴을 검색하여 유사성을 판별하기 때문에 이러한 문제를 해결하였다. 하지만 패턴을 검색하는 시간이 오래 걸리는 단점이 있는데 이를 개선하는 알고리즘 또한 본문에서 소개한다.

This research suggests that valuate similarities by using the matches of patterns which is appeared on different two documents. Statistical ways such as fingerprint method are mainly used for evaluate similarities of existing documents. However, this method has a problem of accuracy for the high similarity which is occurred when many similar words are appeared from two irrelevant documents. These issues are caused by simple comparing of statistical parameters of two documents. But the method using patterns suggested on this research solved those problems because it judges similarity by searching same patterns. This method has a defect, however, that takes long time to search patterns, but this research introduce the algorithms complement this defect.

키워드

참고문헌

  1. 김수영, "표절과 올바른 인용 방법," 가정의학회지, 167-174쪽, 2008년
  2. P. J. Larkham, & Manns, "S. Plagiarism and its treatment in higher education," Journal of Further and Higher Education, 26(4), pp.339-349. 2002. https://doi.org/10.1080/0309877022000021748
  3. D. L. McCabe, L. K. Trevino, & K. D. Butterfield, "Cheating in academic institutions: A decade of research," Ethics & Behavior, 11(3), pp.219-232. 2001. https://doi.org/10.1207/S15327019EB1103_2
  4. J. H. Jonson, "Identifying Redundancy in Source Code using Fingerprints," In proc. of CASCON 93, pp.171-183, 1993.
  5. http://www.plagiarism.org/learning_center/what_is_plagiarism.html
  6. http://www.canexus.com/eve/index.shtml
  7. http://www.turnitin.com/
  8. http://www.copycatch.freeserve.co.uk/
  9. S. Ducasse, M. Rieger, S. Demeyer, "A Language Independent Approach for Detecting Duplicated Code," International Conference on Software Maintenance. pp.109-118, 1999.
  10. J. H. Johnson, "Identifying redundancy in source code using fingerprints," Conference of the Centre for Advanced Studies on Collaborative research, IBM Press pp.171-183, 1993.
  11. J. H. Johnson, "Substring matching for clone detection and change tracking," International Conference on Software Maintenance, IEEE Computer Society Press pp.120-126, 1994.
  12. B. S. Baker, "On finding duplication and near-duplication in large software systems," Second Working Conference on Reverse Engineering, Los Alamitos, California, IEEE Computer Society Press pp. 86-95, 1995.
  13. K. A. Kontogiannis, R. Demori, E. Merlo, M. Galler, M. Bernstein," Pattern matching for clone and concept detection.", Automated Software Engineering Vol. 3, No. 1/2, pp.79-108, 1996.
  14. I. D. Baxter, A. Yahin, L. Moura, M. Sant''Anna, L. Bier, "Clone Detection Using Abstract Syntax Trees," International Conference on Software Maintenance, IEEE Computer Society Press pp.368-378, 1998.
  15. W. Yang, "Identifying syntactic differences between two programs," Software-Practice and Experience Vol. 21, No. 7, pp.739-755, 1991. https://doi.org/10.1002/spe.4380210706
  16. R. Koschke, R. Falke, P. Frenzel, "Clone detection using abstract syntax suffix trees," Working Conference on Reverse Engineering, IEEE Computer Society Press, 2006.
  17. M. J. Wise, "Detection of Similarities in Student Programs: YAP'ing may be Preferable to Plague'ing," ACM SIGSCE Bulletin, In Proc. of 23rd SIGCSE Technical Symp., Vol. 24, No. 1, pp.268-271, March 1992.
  18. A. Aiken, "MOSS(Measure Of Software Similarity) Plagiarism detection system," Available at http://www.cs.berkeley.edu/-moss/, University of Berkeley, CA, Apr. 2000.
  19. L. Prechelt, G. Malpohl & M. Philppsen, "JPlag: Finding Plagiarism Among a Set of Programs," available at http://wwwipd.ira.uka.de/EIR/ D-76128 Karlsruhe, Germany, Technical Report 2000-1, March 2000.
  20. 강승식, 권혁일, 김동렬, "한국어 자동 색인을 위한 형태소 분석 기능," 한국정보과학회, 학술발표논문집 제22권 제1호, 929-932쪽, 1995년 4월
  21. J. H. Jonson, "Identifying Redundancy in Source Code using Fingerprints," In Proc. of CASCON 93, pp.171-183, 1993.
  22. Y. C. Kim, S. K. Kim, S. H. Yeom, J. M. Choi &C. W. Yoo. "A Program-Plagiarism Checker using Abstract Syntax Tree," KISS(Korea Information Science Society), Vol. 30, No. 8, Aug. 2003.