• Title/Summary/Keyword: Plagiarism Detection

Search Result 65, Processing Time 0.03 seconds

A Plagiarism Detection Technique for Source Codes Considering Data Structures (데이터 구조를 고려한 소스코드 표절 검사 기법)

  • Lee, Kihwa;Kim, Yeoneo;Woo, Gyun
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.3 no.6
    • /
    • pp.189-196
    • /
    • 2014
  • Though the plagiarism is illegal and should be avoided, it still occurs frequently. Particularly, the plagiarism of source codes is more frequently committed than others since it is much easier to copy them because of their digital nature. To prevent code plagiarism, there have been reported a variety of studies. However, previous studies for plagiarism detection techniques on source codes do not consider the data structures although a source code consists both of data structures and algorithms. In this paper, a plagiarism detection technique for source codes considering data structures is proposed. Specifically, the data structures of two source codes are represented as sets of trees and compared with each other using Hungarian Method. To show the usefulness of this technique, an experiment has been performed on 126 source codes submitted as homework results in an object-oriented programming course. When both the data structures and the algorithms of the source codes are considered, the precision and the F-measure score are improved 22.6% and 19.3%, respectively, than those of the case where only the algorithms are considered.

Plagiarism Detection among Source Codes using Adaptive Methods

  • Lee, Yun-Jung;Lim, Jin-Su;Ji, Jeong-Hoon;Cho, Hwaun-Gue;Woo, Gyun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.6
    • /
    • pp.1627-1648
    • /
    • 2012
  • We propose an adaptive method for detecting plagiarized pairs from a large set of source code. This method is adaptive in that it uses an adaptive algorithm and it provides an adaptive threshold for determining plagiarism. Conventional algorithms are based on greedy string tiling or on local alignments of two code strings. However, most of them are not adaptive; they do not consider the characteristics of the program set, thereby causing a problem for a program set in which all the programs are inherently similar. We propose adaptive local alignment-a variant of local alignment that uses an adaptive similarity matrix. Each entry of this matrix is the logarithm of the probabilities of the keywords based on their frequency in a given program set. We also propose an adaptive threshold based on the local outlier factor (LOF), which represents the likelihood of an entity being an outlier. Experimental results indicate that our method is more sensitive than JPlag, which uses greedy string tiling for detecting plagiarism-suspected code pairs. Further, the adaptive threshold based on the LOF is shown to be effective, and the detection performance shows high sensitivity with negligible loss of specificity, compared with that using a fixed threshold.

Generating Pylogenetic Tree of Homogeneous Source Code in a Plagiarism Detection System

  • Ji, Jeong-Hoon;Park, Su-Hyun;Woo, Gyun;Cho, Hwan-Gue
    • International Journal of Control, Automation, and Systems
    • /
    • v.6 no.6
    • /
    • pp.809-817
    • /
    • 2008
  • Program plagiarism is widespread due to intelligent software and the global Internet environment. Consequently the detection of plagiarized source code and software is becoming important especially in academic field. Though numerous studies have been reported for detecting plagiarized pairs of codes, we cannot find any profound work on understanding the underlying mechanisms of plagiarism. In this paper, we study the evolutionary process of source codes regarding that the plagiarism procedure can be considered as evolutionary steps of source codes. The final goal of our paper is to reconstruct a tree depicting the evolution process in the source code. To this end, we extend the well-known bioinformatics approach, a local alignment approach, to detect a region of similar code with an adaptive scoring matrix. The asymmetric code similarity based on the local alignment can be considered as one of the main contribution of this paper. The phylogenetic tree or evolution tree of source codes can be reconstructed using this asymmetric measure. To show the effectiveness and efficiency of the phylogeny construction algorithm, we conducted experiments with more than 100 real source codes which were obtained from East-Asia ICPC(International Collegiate Programming Contest). Our experiments showed that the proposed algorithm is quite successful in reconstructing the evolutionary direction, which enables us to identify plagiarized codes more accurately and reliably. Also, the phylogeny construction algorithm is successfully implemented on top of the plagiarism detection system of an automatic program evaluation system.

A Study on Text Pattern Analysis Applying Discrete Fourier Transform - Focusing on Sentence Plagiarism Detection - (이산 푸리에 변환을 적용한 텍스트 패턴 분석에 관한 연구 - 표절 문장 탐색 중심으로 -)

  • Lee, Jung-Song;Park, Soon-Cheol
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.22 no.2
    • /
    • pp.43-52
    • /
    • 2017
  • Pattern Analysis is One of the Most Important Techniques in the Signal and Image Processing and Text Mining Fields. Discrete Fourier Transform (DFT) is Generally Used to Analyzing the Pattern of Signals and Images. We thought DFT could also be used on the Analysis of Text Patterns. In this Paper, DFT is Firstly Adapted in the World to the Sentence Plagiarism Detection Which Detects if Text Patterns of a Document Exist in Other Documents. We Signalize the Texts Converting Texts to ASCII Codes and Apply the Cross-Correlation Method to Detect the Simple Text Plagiarisms such as Cut-and-paste, term Relocations and etc. WordNet is using to find Similarities to Detect the Plagiarism that uses Synonyms, Translations, Summarizations and etc. The Data set, 2013 Corpus, Provided by PAN Which is the One of Well-known Workshops for Text Plagiarism is used in our Experiments. Our Method are Fourth Ranked Among the Eleven most Outstanding Plagiarism Detection Methods.

Plagiarism Detection Using Dependency Graph Analysis Specialized for JavaScript (자바스크립트에 특화된 프로그램 종속성 그래프를 이용한 표절 탐지)

  • Kim, Shin-Hyong;Han, Tai-Sook
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.5
    • /
    • pp.394-402
    • /
    • 2010
  • JavaScript is one of the most popular languages to develope web sites and web applications. Since applicationss written in JavaScript are sent to clients as the original source code, they are easily exposed to plagiarists. Therefore, a method to detect plagiarized JavaScript programs is necessary. The conventional program dependency graph(PDG) based approaches are not suitable to analyze JavaScript programs because they do not reflect dynamic features of JavaScript. They also generate false positives in some cases and show inefficiency with large scale search space. We devise a JavaScript specific PDG(JS PDG) that captures dynamic features of JavaScript and propose a JavaScript plagiarism detection method for precise and fast detection. We evaluate the proposed plagiarism detection method with experiment. Our experiments show that our approach can detect false-positives generated by conventional PDG and can prune the plagiarism search space.

Strengthening Publication Ethics for KODISA Journals: Learning from the Cases of Plagiarism

  • Hwang, Hee-Joong;Lee, Jong-Ho;Lee, Jung-Wan;Kim, Young-Ei;Yang, Hoe-Chang;Youn, Myoung-Kil;Kim, Dong-Ho
    • Journal of Distribution Science
    • /
    • v.13 no.4
    • /
    • pp.5-8
    • /
    • 2015
  • Purpose - The purpose of this paper is to review, analyze, and learn from the most recent cases of plagiarism and to identify and promote ethical practices in research and publication. Research design, data, and methodology - This is a case study, an analytical approach, which focuses on analyzing the most recent cases of plagiarism to identify ethical issues and concerns in journal publication practices. Results - Despite the availability of many software and web-based applications and programs to detect plagiarism, there is no universal or perfect plagiarism detection application available to ease the editorial responsibility. Lack of understanding the concept and ignorance of plagiarism were the main reasons for the cases of plagiarism. Conclusions - Some of the plagiarism cases reveal a lack of knowledge in proper application of in-text citations and references, including quoting, requiting, paraphrasing, and citing sources, etc. Furthermore, the need for recognizing and considering the distorted and falsified primary and secondary research data as plagiarism is essential to enhance ethical practices in journal publication.

Recent Information on the Plagiarism Prevention (표절 방지에 관한 최근 정보)

  • Lee, Sung-Ho
    • Development and Reproduction
    • /
    • v.15 no.1
    • /
    • pp.71-76
    • /
    • 2011
  • Due to its role in maintaining the health of scientific societies, research ethics (or integrity) is notably receiving attention by academia, governments and even individuals who are not engaged in scientific researches. In this paper, I will introduce some valuable papers dealt with plagiarism as a representative research misconduct. In general, researcher's results that will soon be published must meet the crucial scientific criteria: originality, accuracy, reproducibility, precision and research ethics. The definition of plagiarism is "appropriation of another person's ideas, processes, results, or words without giving appropriate credit." Compared to fabrication and falcification, plagiarism is often considered as a minor misconduct. With intentionality, however, plagiarism can be corresponding to 'theft of intellectual product'. The context of plagiarism is not restricted to the stage of publication. It can be extended to prior stages of proposing (i.e. preparing the research proposal) and performing (executing the research), and reviewing (writing the review papers). Duplicate publication is regarded as a self-plagiarism in broad interpretation of plagiarism. To avoid dangers of plagiarism, earnest efforts from all members of scientific community are needed. First of all, researchers should keep 'transparency' and 'integrity' in their scientific works. Editorial board members and reviewers should keep fairness and well-deserved qualification. Government and research foundations must be willing to provide sufficient financial and policy support to the scientific societies; Up-graded editorial services, making good use of plagiarism detection tools, and thorough instruction on how to write a honest scientific paper will contribute to building up a healthy basis for scientific communities.

A Study on Plagiarism Detection System for Documents (문서를 위한 표절 탐지 시스템에 관한 연구)

  • An Byeong-Ryeol;Kim Mun-Hyeon
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2006.05a
    • /
    • pp.413-415
    • /
    • 2006
  • 디지털 시대에는 누구나 쉽게 정보에 접근 할 수가 있어 아주 간단하게 다른 사람의 정보를 불법 복제해서 무단으로 사용하는 경우가 증가하게 되었다. 이는 많은 투자와 노력으로 지식을 생성하는 일도 중요하지만 이를 관리하고 보호하는 일이 중요한 과제로 부상하고 있다는 것을 의미한다. 본 논문에서는 다른 사람의 지적 재산권을 침해하고 표절을 하여 사용했을 경우 이를 효과적으로 탐지하는 새로운 방법과 이론을 제시하고자 한다.

  • PDF

Automated Detecting and Tracing for Plagiarized Programs using Gumbel Distribution Model (굼벨 분포 모델을 이용한 표절 프로그램 자동 탐색 및 추적)

  • Ji, Jeong-Hoon;Woo, Gyun;Cho, Hwan-Gue
    • The KIPS Transactions:PartA
    • /
    • v.16A no.6
    • /
    • pp.453-462
    • /
    • 2009
  • Studies on software plagiarism detection, prevention and judgement have become widespread due to the growing of interest and importance for the protection and authentication of software intellectual property. Many previous studies focused on comparing all pairs of submitted codes by using attribute counting, token pattern, program parse tree, and similarity measuring algorithm. It is important to provide a clear-cut model for distinguishing plagiarism and collaboration. This paper proposes a source code clustering algorithm using a probability model on extreme value distribution. First, we propose an asymmetric distance measure pdist($P_a$, $P_b$) to measure the similarity of $P_a$ and $P_b$ Then, we construct the Plagiarism Direction Graph (PDG) for a given program set using pdist($P_a$, $P_b$) as edge weights. And, we transform the PDG into a Gumbel Distance Graph (GDG) model, since we found that the pdist($P_a$, $P_b$) score distribution is similar to a well-known Gumbel distribution. Second, we newly define pseudo-plagiarism which is a sort of virtual plagiarism forced by a very strong functional requirement in the specification. We conducted experiments with 18 groups of programs (more than 700 source codes) collected from the ICPC (International Collegiate Programming Contest) and KOI (Korean Olympiad for Informatics) programming contests. The experiments showed that most plagiarized codes could be detected with high sensitivity and that our algorithm successfully separated real plagiarism from pseudo plagiarism.

A Study of Natural Language Plagiarism Detection

  • Ahn, Byung-Ryul;Kim, Heon;Kim, Moon-Hyun
    • Proceedings of the Korea Society of Information Technology Applications Conference
    • /
    • 2005.11a
    • /
    • pp.325-329
    • /
    • 2005
  • Vast amount of information is generated and shared in this active digital As the digital informatization is vividly going on now, most of documents are in digitalized forms, and this kind of information is on the increase. It is no exaggeration to say that this kind of newly created information and knowledge would affect the competitiveness and the future of our nation. In addition to that, a lot of investment is being made in information and knowledge based industries at national level and in reality, a lot of efforts are intensively made for research and development of human resources. It becomes easier in digital era to create and share the information as there are various tools that have been developed to create documents along with the internet, and as a result, the share of dual information is increasing day in and day out. At present, a lot of information that is provided online is actually being plagiarized or illegally copied. Specifically, it is very tricky to identify some plagiarism from tremendous amount of information because the original sentences can be simply restructured or replaced with similar words, which would make them look different from original sentences. This means that managing and protecting the knowledge start to be regarded as important, though it is important to create the knowledge through the investment and efforts. This dissertation tries to suggest new method and theory that would be instrumental in effectively detecting any infringement on and plagiarism of intellectual property of others. DICOM(Dynamic Incremental Comparison Method), a method which was developed by this research to detect plagiarism of document, focuses on realizing a system that can detect plagiarized documents and parts efficiently, accurately and immediately by creating positive and various detectors.

  • PDF