• Title/Summary/Keyword: Plagiarism detection program

Search Result 25, Processing Time 0.027 seconds

A Study on Efficient Program Plagiarism Detection (효율적인 프로그램 표절 탐지에 관한 연구)

  • Ahn Byung-Ryul;Kim Moon-Hyun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2006.05a
    • /
    • pp.147-150
    • /
    • 2006
  • 본 논문에서는 각종 언어로 구현된 프로그램의 소스 코드를 표절 하였을 경우 이를 효과적으로 탐지하는 방법과 이론을 제시하고자 한다. 기존에 사용되고 있는 프로그램 표절(plagiarism) 검사 소프트웨어의 장단점을 분석하고, 특히 단점을 극복하기 위한 방법으로 Pattern Matching을 이용한 표절 검출방법을 소개한다. 그리고 기존의 Pattern Matching을 이용한 방법에서 나타나는 문제점을 극복하여 좀 더 발전된 방식의 자동 표절 검출 시스템을 소개하고자 한다.

  • PDF

Recent Information on the Plagiarism Prevention (표절 방지에 관한 최근 정보)

  • Lee, Sung-Ho
    • Development and Reproduction
    • /
    • v.15 no.1
    • /
    • pp.71-76
    • /
    • 2011
  • Due to its role in maintaining the health of scientific societies, research ethics (or integrity) is notably receiving attention by academia, governments and even individuals who are not engaged in scientific researches. In this paper, I will introduce some valuable papers dealt with plagiarism as a representative research misconduct. In general, researcher's results that will soon be published must meet the crucial scientific criteria: originality, accuracy, reproducibility, precision and research ethics. The definition of plagiarism is "appropriation of another person's ideas, processes, results, or words without giving appropriate credit." Compared to fabrication and falcification, plagiarism is often considered as a minor misconduct. With intentionality, however, plagiarism can be corresponding to 'theft of intellectual product'. The context of plagiarism is not restricted to the stage of publication. It can be extended to prior stages of proposing (i.e. preparing the research proposal) and performing (executing the research), and reviewing (writing the review papers). Duplicate publication is regarded as a self-plagiarism in broad interpretation of plagiarism. To avoid dangers of plagiarism, earnest efforts from all members of scientific community are needed. First of all, researchers should keep 'transparency' and 'integrity' in their scientific works. Editorial board members and reviewers should keep fairness and well-deserved qualification. Government and research foundations must be willing to provide sufficient financial and policy support to the scientific societies; Up-graded editorial services, making good use of plagiarism detection tools, and thorough instruction on how to write a honest scientific paper will contribute to building up a healthy basis for scientific communities.

An Adaptive Algorithm for Plagiarism Detection in a Controlled Program Source Set (제한된 프로그램 소스 집합에서 표절 탐색을 위한 적응적 알고리즘)

  • Ji, Jeong-Hoon;Woo, Gyun;Cho, Hwan-Gue
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.12
    • /
    • pp.1090-1102
    • /
    • 2006
  • This paper suggests a new algorithm for detecting the plagiarism among a set of source codes, constrained to be functionally equivalent, such are submitted for a programming assignment or for a programming contest problem. The typical algorithms largely exploited up to now are based on Greedy-String Tiling, which seeks for a perfect match of substrings, and analysis of similarity between strings based on the local alignment of the two strings. This paper introduces a new method for detecting the similar interval of the given programs based on an adaptive similarity matrix, each entry of which is the logarithm of the probabilities of the keywords based on the frequencies of them in the given set of programs. We experimented this method using a set of programs submitted for more than 10 real programming contests. According to the experimental results, we can find several advantages of this method compared to the previous one which uses fixed similarity matrix(+1 for match, -1 for mismatch, -2 for gap) and also can find that the adaptive similarity matrix can be used for detecting various plagiarism cases.

Enhancing the performance of code-clone detection tools using code2vec (code2vec을 이용한 유사도 감정 도구의 성능 개선)

  • Um, Taeho;Hong, Sung Moon;Yang, Joon Hyuk;Jang, Hyo Seok;Doh, Kyung-Goo
    • Journal of Software Assessment and Valuation
    • /
    • v.17 no.1
    • /
    • pp.31-40
    • /
    • 2021
  • Plagiarism refers to the act of using the original data as if it were one's own without revealing the source. The plagiarism of source code causes a variety of problems, including legal disputes. Plagiarism in software projects is usually determined by measuring similarity by comparing every pair of source code within two projects. However, blindly comparing every pair has been a huge computational burden, causing a major factor of not using tools of better accuracy. If we can only compare pairs that are probable to be clones, eliminating pairs that are impossible to be clones, we can concentrate more on improving the accuracy of detection. In this paper, we propose a method of selecting highly probable candidates of clone pairs by pre-classifying suspected source-codes using a machine-learning model called code2vec.

A Method for Detecting Program Plagiarism Comparing Class Structure Graphs (클래스 구조 그래프 비교를 통한 프로그램 표절 검사 방법)

  • Kim, Yeoneo;Lee, Yun-Jung;Woo, Gyun
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.11
    • /
    • pp.37-47
    • /
    • 2013
  • Recently, lots of research results on program comparison have been reported since the code theft become frequent as the increase of code mobility. This paper proposes a plagiarism detection method using class structures. The proposed method constructs a graph representing the referential relationship between the member variables and the methods. This relationship is shown as a bipartite graph and the test for graph isomorphism is applied on the set of graphs to measure the similarity of the programs. In order to measure the effectiveness of this method, an experiment was conducted on the test set, the set of Java source codes submitted as solutions for the programming assignments in Object-Oriented Programming course of Pusan National University in 2012. In order to evaluate the accuracy of the proposed method, the F-measure is compared to those of JPlag and Stigmata. According to the experimental result, the F-measure of the proposed method is higher than those of JPlag and Stigmata by 0.17 and 0.34, respectively.

Tuning the Performance of Haskell Parallel Programs Using GC-Tune (GC-Tune을 이용한 Haskell 병렬 프로그램의 성능 조정)

  • Kim, Hwamok;An, Hyungjun;Byun, Sugwoo;Woo, Gyun
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.8
    • /
    • pp.459-465
    • /
    • 2017
  • Although the performance of computer hardware is increasing due to the development of manycore technologies, software lacking a proportional increase in throughput. Functional languages can be a viable alternative to improve the performance of parallel programs since such languages have an inherent parallelism in evaluating pure expressions without side-effects. Specifically, Haskell is notably popular for parallel programming because it provides easy-to-use parallel constructs based on monads. However, the scalability of parallel programs in Haskell tends to fluctuate as the number of cores increases, and the garbage collector is suspected to be the source of this fluctuations because it affects both the space and the time needed to execute the programs. This paper uses the tuning tool, GC-Tune, to improve the scalability of the performance. Our experiment was conducted with a parallel plagiarism detection program, and the scalability improved. Specifically, the fluctuation range of the speedup was narrowed down by 39% compared to the original execution of the program without any tuning.

Similarity Detection in Object Codes and Design of Its Tool (목적 코드에서 유사도 검출과 그 도구의 설계)

  • Yoo, Jang-Hee
    • Journal of Software Assessment and Valuation
    • /
    • v.16 no.2
    • /
    • pp.1-8
    • /
    • 2020
  • The similarity detection to plagiarism or duplication of computer programs requires a different type of analysis methods and tools according to the programming language used in the implementation and the sort of code to be analyzed. In recent years, the similarity appraisal for the object code in the embedded system, which requires a considerable resource along with a more complicated procedure and advanced skill compared to the source code, is increasing. In this study, we described a method for analyzing the similarity of functional units in the assembly language through the conversion of object code using the reverse engineering approach, such as the reverse assembly technique to the object code. The instruction and operand table for comparing the similarity is generated by using the syntax analysis of the code in assembly language, and a tool for detecting the similarity is designed.

Cross-architecture Binary Function Similarity Detection based on Composite Feature Model

  • Xiaonan Li;Guimin Zhang;Qingbao Li;Ping Zhang;Zhifeng Chen;Jinjin Liu;Shudan Yue
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.8
    • /
    • pp.2101-2123
    • /
    • 2023
  • Recent studies have shown that the neural network-based binary code similarity detection technology performs well in vulnerability mining, plagiarism detection, and malicious code analysis. However, existing cross-architecture methods still suffer from insufficient feature characterization and low discrimination accuracy. To address these issues, this paper proposes a cross-architecture binary function similarity detection method based on composite feature model (SDCFM). Firstly, the binary function is converted into vector representation according to the proposed composite feature model, which is composed of instruction statistical features, control flow graph structural features, and application program interface calling behavioral features. Then, the composite features are embedded by the proposed hierarchical embedding network based on a graph neural network. In which, the block-level features and the function-level features are processed separately and finally fused into the embedding. In addition, to make the trained model more accurate and stable, our method utilizes the embeddings of predecessor nodes to modify the node embedding in the iterative updating process of the graph neural network. To assess the effectiveness of composite feature model, we contrast SDCFM with the state of art method on benchmark datasets. The experimental results show that SDCFM has good performance both on the area under the curve in the binary function similarity detection task and the vulnerable candidate function ranking in vulnerability search task.

Performance Comparison between Haskell Eval Monad and Cloud Haskell (Haskell Eval 모나드와 Cloud Haskell 간의 성능 비교)

  • Kim, Yeoneo;An, Hyungjun;Byun, Sugwoo;Woo, Gyun
    • Journal of KIISE
    • /
    • v.44 no.8
    • /
    • pp.791-802
    • /
    • 2017
  • Competition in the modern CPU market has shifted from speeding up the clock speed of a single core to increasing the number of cores. As such, there is a growing interest in parallel programming to maximize the use of resources of many core processors. In this paper, we propose parallel programming models in Haskell to find an advisable parallel programming model for many-core environments. Specifically, we used Eval monad and Cloud Haskell to develop two versions of parallel programs: plagiarism detection and K-means. Then, we evaluated the performance of the developed programs in 32-core and 120-core environments. The results of our experiment show that the Eval monad is highly efficient in an environment with a small number of cores. On the other hand, the Cloud Haskell runtime shows 37% improvement over Eval monad and the scalability shows a 134% improvement over Eval monad as the number of cores increases.