Browse > Article
http://dx.doi.org/10.22937/IJCSNS.2022.22.8.30

Modern Methods of Text Analysis as an Effective Way to Combat Plagiarism  

Myronenko, Serhii (National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute")
Myronenko, Yelyzaveta (National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute")
Publication Information
International Journal of Computer Science & Network Security / v.22, no.8, 2022 , pp. 242-248 More about this Journal
Abstract
The article presents the analysis of modern methods of automatic comparison of original and unoriginal text to detect textual plagiarism. The study covers two types of plagiarism - literal, when plagiarists directly make exact copying of the text without changing anything, and intelligent, using more sophisticated techniques, which are harder to detect due to the text manipulation, like words and signs replacement. Standard techniques related to extrinsic detection are string-based, vector space and semantic-based. The first, most common and most successful target models for detecting literal plagiarism - N-gram and Vector Space are analyzed, and their advantages and disadvantages are evaluated. The most effective target models that allow detecting intelligent plagiarism, particularly identifying paraphrases by measuring the semantic similarity of short components of the text, are investigated. Models using neural network architecture and based on natural language sentence matching approaches such as Densely Interactive Inference Network (DIIN), Bilateral Multi-Perspective Matching (BiMPM) and Bidirectional Encoder Representations from Transformers (BERT) and its family of models are considered. The progress in improving plagiarism detection systems, techniques and related models is summarized. Relevant and urgent problems that remain unresolved in detecting intelligent plagiarism - effective recognition of unoriginal ideas and qualitatively paraphrased text - are outlined.
Keywords
Literal and intelligent plagiarism; extrinsic detection; techniques; target models; backbone neural architectures;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Vani K., Gupta D.: Study on extrinsic text plagiarism detection techniques and tools. Journal of engineering science and technology review, 9(5), 9-23 (2016).   DOI
2 Alzahrani S. M., Salim N., Abraham A.: Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Transactions on systems, man, and cybernetics - Part C: applications and reviews, 42(2), 133-149 (2012).   DOI
3 Gupta D., Vani K., Leema L.M.: Plagiarism detection in text documents using sentence bounded stop word n-grams. Journal of engineering science and technology, 11(10), 1403-1420, 2016.
4 Adam R., Suharjito M.: Plagiarism detection algorithm using natural language processing based on grammar analyzing. Journal of theoretical and applied information technology, 63(1), 168-180 (2014).
5 Akanksha B., Anukruti A., Tarjni V., Desai S., Nair A.: A Survey on plagiarism detection. Advances in computational sciences and technology, 10(8), 2359-2365 (2017).
6 Sanchez-Vega F., Villatoro-Tello E., Montes-y-Gomez M., Pineda L.V., Rosso P.: Determining and characterizing the reused text for plagiarism detection. Expert systems with applications, 40(5), 1804-1813 (2013).   DOI
7 Maurer H., Kappe F., Zaka B.: Plagiarism - A Survey. Journal of universal computer science, 12(8), 1050-1084 (2006).
8 Thomas S. W., Adams B., Hassan A. E., Blostein D.: Studying software evolution using topic models. Science of computer programming 80: 457-479 (2014).   DOI
9 Bin-Habtoor A. S., Zaher M. A.: A survey on text plagiarism detection systems. International journal of computer theory and engineering, 4(2), 185-188 (2012).   DOI
10 Potthast M., Barron-Cedeno A., Stein B., Rosso P.: Cross-language plagiarism detection. Language resources & evaluation, 45(1), 45-62 (2011).   DOI
11 Araseab Y., Tsujiibc J.: Transfer fine-tuning of BERT with phrasal paraphrases. Computer speech & language, 66, 101-164 (2021).
12 Guu K., Hashimoto T. B., Yonatan Oren Y., Liang P.: Generating sentences by editing prototypes. Transactions of the Association for Computational Linguistics, 6, 437-450 (2018).   DOI
13 Shi Z., Minlie Huang M.: Robustness to modification with shared words in paraphrase identification. Association for computational linguistics. Findings of the association for computational linguistics: EMNLP 2020, 164-171, 2020.
14 Vo N. P. A., Popescu O., Magnolini S.: Paraphrase identification and semantic similarity in Twitter with simple features. Association for computational linguistics. Proceedings of the Third International Workshop on natural language processing for social media, 10-19, 2015.
15 Clough P., Stevenson M.: Developing a corpus of plagiarised short answers. Language resources and evaluation, 45(1), 5-24 (2011).   DOI
16 Chew Y. C., Yoshiki Mikami Y., Nagano R. L.: Language identification of web pages based on improved n-gram algorithm. International journal of computer science, 8(3), 47-58 (2011).
17 Nahas M. N.: Survey and comparison between plagiarism detection tools. American journal of data mining and knowledge discovery, 2(2), 50-53 (2017).
18 Amine A., Elberrichi Z., Simonet M.: Automatic Language Identification: An Alternative Unsupervised Approach Using a New Hybrid Algorithm. International Journal of Computer Science and Applications, 7(1), 94-107 (2010).
19 Peng X., Huang J., Hu Q., Zhang S., Elgammal A., Metaxas D.: From circle to 3-sphere: Head pose estimation by instance parameterization. Computer vision and image understanding, 136, 92-102 (2015).   DOI
20 Carvalho N. R., Almeida J. J., Henriques P. R., Varanda M. J.: From source code identifiers to natural language terms. Journal of systems and software, 100, 117-128 (2015).   DOI
21 Arrish S., Afif F. N., Maidorawa A., Salim N.: Shape-based plagiarism detection for flowchart figures in texts. International journal of computer science & information technology, 6(1), 113-124 (2014).   DOI
22 Oberreuter G., Velasquez J.: Text mining applied to plagiarism detection: The use of words for detecting deviations in the writing style. Expert systems with applications, 40(9), 3756-3763 (2013).   DOI
23 Ji Y., Eisenstein J.: Discriminative improvements to distributional sentence similarity. Proceedings of the 2013 Conference on empirical methods in natural language processing, 891-896. Seattle, Washington, USA (October 18-21), 2013.
24 Madnani N., Dorr B. J.: Generating Phrasal and Sentential Paraphrases: A Survey of data-driven methods. Computational linguistics, 36(3), 341-387 (2010).   DOI
25 Nguyen-Son Q., Yusuke Miyao Y., Echizen I.: Paraphrase detection based on identical phrase and similar word matching. 29th Pacific Asia conference on language, Information and computation, 504-512. Shanghai, China (October 30-November 1), 2015.
26 Gipp B., Meuschke N., Beel J.: Comparative evaluation of text- and citation-based plagiarism detection approaches using GuttenPlag. In Proceedings of 11th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'11), 255-258. Ottawa, Canada (June 13-17), 2011.
27 Butakov S., Dyagilev V., Tskhay A.: Protecting students' intellectual property in the web plagiarism detection process. The International review of research in open and distributed learning, 13(5), 1-19 (2012).   DOI
28 Adhya S., Setua S. K.: Text plagiarism checker using friendship graphs. International journal of computer science & information technology, 8(4), 13-21 (2016).   DOI