Browse > Article
http://dx.doi.org/10.9723/jksiis.2017.22.2.043

A Study on Text Pattern Analysis Applying Discrete Fourier Transform - Focusing on Sentence Plagiarism Detection -  

Lee, Jung-Song (전북대학교 전자정보공학부)
Park, Soon-Cheol (전북대학교 컴퓨터공학부)
Publication Information
Journal of Korea Society of Industrial Information Systems / v.22, no.2, 2017 , pp. 43-52 More about this Journal
Abstract
Pattern Analysis is One of the Most Important Techniques in the Signal and Image Processing and Text Mining Fields. Discrete Fourier Transform (DFT) is Generally Used to Analyzing the Pattern of Signals and Images. We thought DFT could also be used on the Analysis of Text Patterns. In this Paper, DFT is Firstly Adapted in the World to the Sentence Plagiarism Detection Which Detects if Text Patterns of a Document Exist in Other Documents. We Signalize the Texts Converting Texts to ASCII Codes and Apply the Cross-Correlation Method to Detect the Simple Text Plagiarisms such as Cut-and-paste, term Relocations and etc. WordNet is using to find Similarities to Detect the Plagiarism that uses Synonyms, Translations, Summarizations and etc. The Data set, 2013 Corpus, Provided by PAN Which is the One of Well-known Workshops for Text Plagiarism is used in our Experiments. Our Method are Fourth Ranked Among the Eleven most Outstanding Plagiarism Detection Methods.
Keywords
Discrete Fourier Transform; Sentence Plagiarism Detection; Text Signal; Cross-Correlation; WordNet;
Citations & Related Records
Times Cited By KSCI : 5  (Citation Analysis)
연도 인용수 순위
1 Cetin, E., Morling, R. C., and Kale, I. "An Integrated 256-Point Complex FFT Processor for Real-Time Spectrum Analysis and Measurement," Instrumentation and Measurement Technology Conference, pp. 96-101, 1997.
2 Briggs, W. L. and Henson, V. E, The DFT: an Owners' Manual for the Discrete Fourier Transform, Society for Industrial and Applied Mathematics, 1995.
3 Howell, K. B., Principles of Fourier Analysis, CRC Press, 2001.
4 Lynn, P. A. and Fuerst, W., Introductory Digital Signal Processing with Computer Applications, John Wiley, 1998.
5 Lee, C. H., "A Pattern Matching Algorithm using Correlation in Fourier Domain," Journal of Korea Multimedia Society, Vol. 7, No. 9, pp. 1255-1262, 2004.
6 Han, J. Y., Cho, C. H., and Son, I. S., "An Empirical Study on Corporate use of Big Data : The Case of Integrated Customer Log System at a Korean Home Shopping Firm," Journal of Internet Electronic Commerce Research, Vol. 15, No. 6, pp. 1-19, 2015.   DOI
7 Hwang, I. S., "A Study on Plagiarism Detection and Document Classification using Association Analysis," Journal of Information Systems, Vol. 23, No. 3, pp. 127-142, 2014.   DOI
8 Lyon, C., Malcolm, J., and Dickerson, B., "Detecting Short Passages of Similar Text in Large Document Collections," International Conference on Empirical Methods in Natural Language Processing, pp. 118-125, 2001.
9 Lewis, J. P., “Fast Template Matching,” Vision Interface, Vol. 95, No. 120123, pp. 15-19, 1995.
10 Smith, J. O., Mathematics of the Discrete Fourier Transform (DFT): with Audio Applications, Julius Smith, 2007.
11 Miller, G. A., “WordNet: A Lexical Database for English,” Communications of the ACM, Vol. 38, No. 11, pp. 39-41, 1995.   DOI
12 Wu, Z. and Palmer, M., "Verbs Semantics and Lexical Selection," 32nd Annual Meeting on Association for Computational Linguistics, pp. 133-138, 1994.
13 Resnik, P., "Using Information Content to Evaluate Semantic Similarity in a Taxonomy," 14th International Joint Conference on Artificial Intelligence, 1995.
14 Banerjee, S. and Pedersen, T., "An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet," International Conference on Intelligent Text Processing and Computational Linguistics, pp. 136-145, 2002.
15 Jiang, J. J. and Conrath, D. W., "Semantic Similarity based on Corpus Statistics and Lexical Taxonomy," International Conference Research on Computational Linguistics, 1997.
16 Leacock, C. and Chodorow, M., “Combining Local Context and Wordnet Similarity for Word Sense Identification,” WordNet: An Electronic Lexical Database, Vol. 49, No. 2, pp, 265-283, 1998.
17 D., "An Information-Theoretic Definition of Similarity," International Conference on Machine Learning, Vol. 98, pp. 296-304, 1998.
18 Cheema, W. A., Najib, F., Ahmed, S., Bukhari, S. H., Sittar, A., and Nawab, R. M. A, "A Corpus for Analyzing Text Reuse by People of Different Groups," 5th International Conference of the CLEF Initiative, 2014.
19 Potthast, M., Hagen, M., Gollub, T., Tippmann, M., Kiesel, J., Rosso, P., and Stein, B., "Overview of the 5th International Competition on Plagiarism Detection," Conference on Multilingual and Multimodal Information Access Evaluation, pp. 301-331, 2013.
20 Potthast, M., Stein, B., Barron-Cedeno, A., and Rosso, P., "An Evaluation Framework for Plagiarism Detection," 23rd International Conference on Computational Linguistics, pp. 997-1005, 2010.
21 Lee, J. K. and Kim K. J., "Educational Contents and Implementation Procedures of the Training System for Research Ethics", Journal of the Korea Industrial Information Systems Research, Vol. 15, No. 5, pp. 235-246, 2010.
22 Lee, J. S. and Park S. C., "The Document Clustering using Multi-Objective Genetic Algorithms", Journal of the Korea Industrial Information Systems Research, Vol. 17, No. 2, pp. 57-64, 2012.   DOI
23 Choi, L. C., Park S. C., and Song, W., "Comparison of Document Clustering algorithm using Genetic Algorithms by Individual Structures", Journal of the Korea Industrial Information Systems Research, Vol. 16, No. 3, pp. 47-56, 2011.   DOI