Search | Korea Science

Modern Methods of Text Analysis as an Effective Way to Combat Plagiarism

Myronenko, Serhii;Myronenko, Yelyzaveta
- International Journal of Computer Science & Network Security
- /
- v.22 no.8
- /
- pp.242-248
- /
- 2022
The article presents the analysis of modern methods of automatic comparison of original and unoriginal text to detect textual plagiarism. The study covers two types of plagiarism - literal, when plagiarists directly make exact copying of the text without changing anything, and intelligent, using more sophisticated techniques, which are harder to detect due to the text manipulation, like words and signs replacement. Standard techniques related to extrinsic detection are string-based, vector space and semantic-based. The first, most common and most successful target models for detecting literal plagiarism - N-gram and Vector Space are analyzed, and their advantages and disadvantages are evaluated. The most effective target models that allow detecting intelligent plagiarism, particularly identifying paraphrases by measuring the semantic similarity of short components of the text, are investigated. Models using neural network architecture and based on natural language sentence matching approaches such as Densely Interactive Inference Network (DIIN), Bilateral Multi-Perspective Matching (BiMPM) and Bidirectional Encoder Representations from Transformers (BERT) and its family of models are considered. The progress in improving plagiarism detection systems, techniques and related models is summarized. Relevant and urgent problems that remain unresolved in detecting intelligent plagiarism - effective recognition of unoriginal ideas and qualitatively paraphrased text - are outlined.
https://doi.org/10.22937/IJCSNS.2022.22.8.30 인용 PDF KSCI

Implementation of A Plagiarism Detecting System with Sentence and Syntactic Word Similarities (문장 및 어절 유사도를 이용한 표절 탐지 시스템 구현)

Maeng, Joosoo;Park, Ji Su;Shon, Jin Gon
- KIPS Transactions on Software and Data Engineering
- /
- v.8 no.3
- /
- pp.109-114
- /
- 2019
The similarity detecting method that is basically used in most plagiarism detecting systems is to use the frequency of shared words based on morphological analysis. However, this method has limitations on detecting accurate degree of similarity, especially when similar words concerning the same topics are used, sentences are partially separately excerpted, or postpositions and endings of words are similar. In order to overcome this problem, we have designed and implemented a plagiarism detecting system that provides more reliable similarity information by measuring sentence similarity and syntactic word similarity in addition to the conventional word similarity. We have carried out a comparison of on our system with a conventional system using only word similarity. The comparative experiment has shown that our system can detect plagiarized document that the conventional system can detect or cannot.
https://doi.org/10.3745/KTSDE.2019.8.3.109 인용 PDF KSCI HTML

Program Plagiarism Detection through Memory Access Log Analysis (메모리 액세스 로그 분석을 통한 프로그램 표절 검출)

Park, Sung-Yun;Han, Sang-Yong
- The KIPS Transactions:PartD
- /
- v.13D no.6 s.109
- /
- pp.833-838
- /
- 2006
Program Plagiarism is an infringement of software copyright. In detecting program plagiarism, many different source program comparison methods has been studied. But, it is not easy to detect plagiarized program that made a few cosmetic changes in program structures and variable names In this paper, we propose a new ground-breaking technique in detecting plagiarism by Memory Access Log Analysis.
https://doi.org/10.3745/KIPSTD.2006.13D.6.833 인용 PDF KSCI

Plagiarism Detection among Source Codes using Adaptive Methods

Lee, Yun-Jung;Lim, Jin-Su;Ji, Jeong-Hoon;Cho, Hwaun-Gue;Woo, Gyun
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.6 no.6
- /
- pp.1627-1648
- /
- 2012
We propose an adaptive method for detecting plagiarized pairs from a large set of source code. This method is adaptive in that it uses an adaptive algorithm and it provides an adaptive threshold for determining plagiarism. Conventional algorithms are based on greedy string tiling or on local alignments of two code strings. However, most of them are not adaptive; they do not consider the characteristics of the program set, thereby causing a problem for a program set in which all the programs are inherently similar. We propose adaptive local alignment-a variant of local alignment that uses an adaptive similarity matrix. Each entry of this matrix is the logarithm of the probabilities of the keywords based on their frequency in a given program set. We also propose an adaptive threshold based on the local outlier factor (LOF), which represents the likelihood of an entity being an outlier. Experimental results indicate that our method is more sensitive than JPlag, which uses greedy string tiling for detecting plagiarism-suspected code pairs. Further, the adaptive threshold based on the LOF is shown to be effective, and the detection performance shows high sensitivity with negligible loss of specificity, compared with that using a fixed threshold.
https://doi.org/10.3837/tiis.2012.06.008 인용 PDF KSCI

Development of A Plagiarism Detection System Using Web Search and Morpheme Analysis (인터넷 검색과 형태소분석을 이용한 표절검사시스템의 개발에 관한 연구)

Hwang, In-Soo
- Journal of Information Technology Applications and Management
- /
- v.16 no.1
- /
- pp.21-36
- /
- 2009
As the World Wide Web (WWW) has become a major channel for information delivery, the data accumulated in the Internet increases at an incredible speed, and it derives the advances of information search technologies. It is the search engine that solves the problem of information overloading and helps people to identify relevant information. However, as search engines become a powerful tool for finding information, the opportunities of plagiarizing have increased significantly in e-Learning. In this paper, we developed an online plagiarism detection system for detecting plagiarized documents that incorporates the functions of search engines and acts in exactly the same way of plagiarizing. The plagiarism detection system uses morpheme analysis to improve the performance and sentence-based comparison to investigate document comes from multiple sources. As a result of applying this system in e-Learning, the performance of plagiarism detection was improved.
PDF

Developing of Text Plagiarism Detection Model using Korean Corpus Data (한글 말뭉치를 이용한 한글 표절 탐색 모델 개발)

Ryu, Chang-Keon;Kim, Hyong-Jun;Cho, Hwan-Gue
- Journal of KIISE:Computing Practices and Letters
- /
- v.14 no.2
- /
- pp.231-235
- /
- 2008
Recently we witnessed a few scandals on plagiarism among academic paper and novels. Plagiarism on documents is getting worse more frequently. Although plagiarism on English had been studied so long time, we hardly find the systematic and complete studies on plagiarisms in Korean documents. Since the linguistic features of Korean are quite different from those of English, we cannot apply the English-based method to Korean documents directly. In this paper, we propose a new plagiarism detecting method for Korean, and we throughly tested our algorithm with one benchmark Korean text corpus. The proposed method is based on "k-mer" and "local alignment" which locates the region of plagiarized document pairs fast and accurately. Using a Korean corpus which contains more than 10 million words, we establish a probability model (or local alignment score (random similarity by chance). The experiment has shown that our system was quite successful to detect the plagiarized documents.
PDF KSCI

Facilitating Conditions and the Use of Plagiarism Detection Software by Postgraduates of the University of Ibadan, Oyo State, Nigeria

Oluwaseun Jolayemi;Olawale Oyewole;Oluwatosin Oladejo
- International Journal of Knowledge Content Development & Technology
- /
- v.14 no.3
- /
- pp.39-57
- /
- 2024
Plagiarism detection software is beneficial in detecting plagiarism in research works of postgraduate students. Despite the benefits of using plagiarism detection software, studies have revealed that most students, including postgraduates, do not use plagiarism detection software as expected. This could depend on the provision of facilitating conditions like internet connectivity, training opportunities and electricity. Thus, this study examined facilitating conditions and the use of plagiarism detection software among postgraduates of the University of Ibadan, Nigeria. A descriptive survey research design of the correlational type was used for this study, with a population of 2143 postgraduates. The multi-stage random sampling technique was used to determine the sample size of 242. The questionnaire was the research instrument, and data was analysed using descriptive statistics. Results showed that most postgraduates agreed that the university provided facilitating conditions like internet connectivity. The majority of the respondents noted that they used Turnitin monthly. Most of the respondents noted that they used plagiarism detection software to paraphrase their work and check the correctness of the grammar in their documents. The most prominent challenges confronting plagiarism detection software use by most respondents were their inability to afford subscription payment to use the plagiarism detection software and slow internet connectivity. There was a significant positive relationship between facilitating conditions and the use of plagiarism detection software by the postgraduates of the University of Ibadan, Nigeria. Some of the recommendations for the institution's management include leveraging the vast network of alumni willing to give back to the institution and intervening in the provision of internet connectivity and electricity.
https://doi.org/10.5865/IJKCT.2024.14.3.039 인용 PDF

An Adaptive Algorithm for Plagiarism Detection in a Controlled Program Source Set (제한된 프로그램 소스 집합에서 표절 탐색을 위한 적응적 알고리즘)

Ji, Jeong-Hoon;Woo, Gyun;Cho, Hwan-Gue
- Journal of KIISE:Software and Applications
- /
- v.33 no.12
- /
- pp.1090-1102
- /
- 2006
This paper suggests a new algorithm for detecting the plagiarism among a set of source codes, constrained to be functionally equivalent, such are submitted for a programming assignment or for a programming contest problem. The typical algorithms largely exploited up to now are based on Greedy-String Tiling, which seeks for a perfect match of substrings, and analysis of similarity between strings based on the local alignment of the two strings. This paper introduces a new method for detecting the similar interval of the given programs based on an adaptive similarity matrix, each entry of which is the logarithm of the probabilities of the keywords based on the frequencies of them in the given set of programs. We experimented this method using a set of programs submitted for more than 10 real programming contests. According to the experimental results, we can find several advantages of this method compared to the previous one which uses fixed similarity matrix(+1 for match, -1 for mismatch, -2 for gap) and also can find that the adaptive similarity matrix can be used for detecting various plagiarism cases.
PDF KSCI

A Plagiarism Detection Technique for Java Program Using Bytecode Analysis (바이트코드 분석을 이용한 자바 프로그램 표절검사기법)

Ji, Jeong-Hoon;Woo, Gyun;Cho, Hwan-Gue
- Journal of KIISE:Software and Applications
- /
- v.35 no.7
- /
- pp.442-451
- /
- 2008
Most plagiarism detection systems evaluate the similarity of source codes and detect plagiarized program pairs. If we use the source codes in plagiarism detection, the source code security can be a significant problem. Plagiarism detection based on target code can be used for protecting the security of source codes. In this paper, we propose a new plagiarism detection technique for Java programs using bytecodes without referring their source codes. The plagiarism detection procedure using bytecode consists of two major steps. First, we generate the token sequences from the Java class file by analyzing the code area of methods. Then, we evaluate the similarity between token sequences using the adaptive local alignment. According to the experimental results, we can find the distributions of similarities of the source codes and that of bytecodes are very similar. Also, the correlation between the similarities of source code pairs and those of bytecode pairs is high enough for typical test data. The plagiarism detection system using bytecode can be used as a preliminary verifying tool before detecting the plagiarism by source code comparison.
PDF KSCI

Generating Pylogenetic Tree of Homogeneous Source Code in a Plagiarism Detection System

Ji, Jeong-Hoon;Park, Su-Hyun;Woo, Gyun;Cho, Hwan-Gue
- International Journal of Control, Automation, and Systems
- /
- v.6 no.6
- /
- pp.809-817
- /
- 2008
Program plagiarism is widespread due to intelligent software and the global Internet environment. Consequently the detection of plagiarized source code and software is becoming important especially in academic field. Though numerous studies have been reported for detecting plagiarized pairs of codes, we cannot find any profound work on understanding the underlying mechanisms of plagiarism. In this paper, we study the evolutionary process of source codes regarding that the plagiarism procedure can be considered as evolutionary steps of source codes. The final goal of our paper is to reconstruct a tree depicting the evolution process in the source code. To this end, we extend the well-known bioinformatics approach, a local alignment approach, to detect a region of similar code with an adaptive scoring matrix. The asymmetric code similarity based on the local alignment can be considered as one of the main contribution of this paper. The phylogenetic tree or evolution tree of source codes can be reconstructed using this asymmetric measure. To show the effectiveness and efficiency of the phylogeny construction algorithm, we conducted experiments with more than 100 real source codes which were obtained from East-Asia ICPC(International Collegiate Programming Contest). Our experiments showed that the proposed algorithm is quite successful in reconstructing the evolutionary direction, which enables us to identify plagiarized codes more accurately and reliably. Also, the phylogeny construction algorithm is successfully implemented on top of the plagiarism detection system of an automatic program evaluation system.
PDF KSCI

Search Result 19, Processing Time 0.021 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)