• Title/Summary/Keyword: Source Code Clustering

Search Result 13, Processing Time 0.024 seconds

A design of the PSDG based semantic slicing model for software maintenance (소프트웨어의 유지보수를 위한 PSDG기반 의미분할모형의 설계)

  • Yeo, Ho-Young;Lee, Kee-O;Rhew, Sung-Yul
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.8
    • /
    • pp.2041-2049
    • /
    • 1998
  • This paper suggests a technique for program segmentation and maintenance using PSDG(Post-State Dependency Graph) that improves the quality of a software by identifying and detecting defects in already fixed source code. A program segmentation is performed by utilizing source code analysis which combines the measures of static, dynamic and semantic slicing when we need understandability of defect in programs for corrective maintanence. It provides users with a segmental principle to split a program by tracing state dependency of a source code with the graph, and clustering and highlighting, Through a modeling of the PSDG, elimination of ineffective program deadcode and generalization of related program segments arc possible, Additionally, it can be correlated with other design modeb as STD(State Transition Diagram), also be used as design documents.

  • PDF

A Method of Object Identification from Procedural Programs (절차적 프로그램으로부터의 객체 추출 방법론)

  • Jin, Yun-Suk;Ma, Pyeong-Su;Sin, Gyu-Sang
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.10
    • /
    • pp.2693-2706
    • /
    • 1999
  • Reengineering to object-oriented system is needed to maintain the system and satisfy requirements of structure change. Target systems which should be reengineered to object-oriented system are difficult to change because these systems have no design document or their design document is inconsistent of source code. Using design document to identifying objects for these systems is improper. There are several researches which identify objects through procedural source code analysis. In this paper, we propose automatic object identification method based on clustering of VTFG(Variable-Type-Function Graph) which represents relations among variables, types, and functions. VTFG includes relations among variables, types, and functions that may be basis of objects, and weights of these relations. By clustering related variables, types, and functions using their weights, our method overcomes limit of existing researches which identify too big objects or objects excluding many functions. The method proposed in this paper minimizes user's interaction through automatic object identification and make it easy to reenginner procedural system to object-oriented system.

  • PDF

Shape Design of Passages for Turbine Blade Using Design Optimization System (최적화설계시스템을 이용한 터빈블레이드 냉각통로의 형상설계)

  • Jeong Min-Joong;Lee Joon-Seong
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.29 no.7 s.238
    • /
    • pp.1013-1021
    • /
    • 2005
  • In this paper, we developed an automatic design optimization system for parametric shape optimization of cooling passages inside axial turbine blades. A parallel three-dimensional thermoelasticity finite element analysis code from an open source system was used to perform automatic thermal and stress analysis of different blade configuration. The developed code was connected to an evolutionary optimizer and built in a design optimization system. Using the optimization system, 279 feasible and optimal solutions were searched. It is provided not only one best solution of the searched solutions, but also information of variation structure and correlation of the 279 solutions in function, variable, and real design spaces. To explore design information, it is proposed a new interpretation approach based on evolutionary clustering and principal component analysis. The interpretation approach might be applicable to the increasing demands in the general area of design optimization.

Hierarchical Clustering Methodology for Source Code Plagiarism Detection (계층적 군집화 기법을 이용한 소스 코드 표절 검사)

  • Sohn, Ki-Rack;Moon, Seung-Mi
    • Journal of The Korean Association of Information Education
    • /
    • v.11 no.1
    • /
    • pp.91-98
    • /
    • 2007
  • Plagiarism is a serious problem in school education due to current technologies such as the internet and word processors. This paper presents how to detect source code plagiarism using similarity based on string comparison methods. The main contribution is to use hierarchical agglomerative clustering technique to classify plagiarism groups, which are then visualized as a dendrogram. Graders can set an empirical threshold to the dendrogram to navigate plagiarism groups. We evaluated the performance of the presented method with a real world data. The result showed the usefulness and applicability of this method.

  • PDF

A Code Clustering Technique for Unifying Method Full Path of Reusable Cloned Code Sets of a Product Family (제품군의 재사용 가능한 클론 코드의 메소드 경로 통일을 위한 코드 클러스터링 방법)

  • Kim, Taeyoung;Lee, Jihyun;Kim, Eunmi
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.1
    • /
    • pp.1-18
    • /
    • 2023
  • Similar software is often developed with the Clone-And-Own (CAO) approach that copies and modifies existing artifacts. The CAO approach is considered as a bad practice because it makes maintenance difficult as the number of cloned products increases. Software product line engineering is a methodology that can solve the issue of the CAO approach by developing a product family through systematic reuse. Migrating product families that have been developed with the CAO approach to the product line engineering begins with finding, integrating, and building them as reusable assets. However, cloning occurs at various levels from directories to code lines, and their structures can be changed. This makes it difficult to build product line code base simply by finding clones. Successful migration thus requires unifying the source code's file path, class name, and method signature. This paper proposes a clustering method that identifies a set of similar codes scattered across product variants and some of their method full paths are different, so path unification is necessary. In order to show the effectiveness of the proposed method, we conducted an experiment using the Apo Games product line, which has evolved with the CAO approach. As a result, the average precision of clustering performed without preprocessing was 0.91 and the number of identified common clusters was 0, whereas our method showed 0.98 and 15 respectively.

Technology Clustering Using Textual Information of Reference Titles in Scientific Paper (과학기술 논문의 참고문헌 텍스트 정보를 활용한 기술의 군집화)

  • Park, Inchae;Kim, Songhee;Yoon, Byungun
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.43 no.2
    • /
    • pp.25-32
    • /
    • 2020
  • Data on patent and scientific paper is considered as a useful information source for analyzing technological information and has been widely utilized. Technology big data is analyzed in various ways to identify the latest technological trends and predict future promising technologies. Clustering is one of the ways to discover new features by creating groups from technology big data. Patent includes refined bibliographic information such as patent classification code whereas scientific paper does not have appropriate bibliographic information for clustering. This research proposes a new approach for clustering data of scientific paper by utilizing reference titles in each scientific paper. In this approach, the reference titles are considered as textual information because each reference consists of the title of the paper that represents the core content of the paper. We collected the scientific paper data, extracted the title of the reference, and conducted clustering by measuring the text-based similarity. The results from the proposed approach are compared with the results using existing methodologies that one is the approach utilizing textual information from titles and abstracts and the other one is a citation-based approach. The suggested approach in this paper shows statistically significant difference compared to the existing approaches and it shows better clustering performance. The proposed approach will be considered as a useful method for clustering scientific papers.

The attacker group feature extraction framework : Authorship Clustering based on Genetic Algorithm for Malware Authorship Group Identification (공격자 그룹 특징 추출 프레임워크 : 악성코드 저자 그룹 식별을 위한 유전 알고리즘 기반 저자 클러스터링)

  • Shin, Gun-Yoon;Kim, Dong-Wook;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • v.21 no.2
    • /
    • pp.1-8
    • /
    • 2020
  • Recently, the number of APT(Advanced Persistent Threats) attack using malware has been increasing, and research is underway to prevent and detect them. While it is important to detect and block attacks before they occur, it is also important to make an effective response through an accurate analysis for attack case and attack type, these respond which can be determined by analyzing the attack group of such attacks. Therefore, this paper propose a framework based on genetic algorithm for analyzing malware and understanding attacker group's features. The framework uses decompiler and disassembler to extract related code in collected malware, and analyzes information related to author through code analysis. Malware has unique characteristics that only it has, which can be said to be features that can identify the author or attacker groups of that malware. So, we select specific features only having attack group among the various features extracted from binary and source code through the authorship clustering method, and apply genetic algorithm to accurate clustering to infer specific features. Also, we find features which based on characteristics each group of malware authors has that can express each group, and create profiles to verify that the group of authors is correctly clustered. In this paper, we do experiment about author classification using genetic algorithm and finding specific features to express author characteristic. In experiment result, we identified an author classification accuracy of 86% and selected features to be used for authorship analysis among the information extracted through genetic algorithm.

Automated Detecting and Tracing for Plagiarized Programs using Gumbel Distribution Model (굼벨 분포 모델을 이용한 표절 프로그램 자동 탐색 및 추적)

  • Ji, Jeong-Hoon;Woo, Gyun;Cho, Hwan-Gue
    • The KIPS Transactions:PartA
    • /
    • v.16A no.6
    • /
    • pp.453-462
    • /
    • 2009
  • Studies on software plagiarism detection, prevention and judgement have become widespread due to the growing of interest and importance for the protection and authentication of software intellectual property. Many previous studies focused on comparing all pairs of submitted codes by using attribute counting, token pattern, program parse tree, and similarity measuring algorithm. It is important to provide a clear-cut model for distinguishing plagiarism and collaboration. This paper proposes a source code clustering algorithm using a probability model on extreme value distribution. First, we propose an asymmetric distance measure pdist($P_a$, $P_b$) to measure the similarity of $P_a$ and $P_b$ Then, we construct the Plagiarism Direction Graph (PDG) for a given program set using pdist($P_a$, $P_b$) as edge weights. And, we transform the PDG into a Gumbel Distance Graph (GDG) model, since we found that the pdist($P_a$, $P_b$) score distribution is similar to a well-known Gumbel distribution. Second, we newly define pseudo-plagiarism which is a sort of virtual plagiarism forced by a very strong functional requirement in the specification. We conducted experiments with 18 groups of programs (more than 700 source codes) collected from the ICPC (International Collegiate Programming Contest) and KOI (Korean Olympiad for Informatics) programming contests. The experiments showed that most plagiarized codes could be detected with high sensitivity and that our algorithm successfully separated real plagiarism from pseudo plagiarism.

Deep Learning based Dynamic Taint Detection Technique for Binary Code Vulnerability Detection (바이너리 코드 취약점 탐지를 위한 딥러닝 기반 동적 오염 탐지 기술)

  • Kwang-Man Ko
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.16 no.3
    • /
    • pp.161-166
    • /
    • 2023
  • In recent years, new and variant hacking of binary codes has increased, and the limitations of techniques for detecting malicious codes in source programs and defending against attacks are often exposed. Advanced software security vulnerability detection technology using machine learning and deep learning technology for binary code and defense and response capabilities against attacks are required. In this paper, we propose a malware clustering method that groups malware based on the characteristics of the taint information after entering dynamic taint information by tracing the execution path of binary code. Malware vulnerability detection was applied to a three-layered Few-shot learning model, and F1-scores were calculated for each layer's CPU and GPU. We obtained 97~98% performance in the learning process and 80~81% detection performance in the test process.

The Analysis of the effects of the platform screen door on the fire driven flow in The Deeply Underground Subway Station (대심도 지하역사에서의 화재시 플랫폼 스크린 도어에 의한 열, 연기 거동 영향 분석)

  • Jang, Y.J.;Kim, H.B.;Lee, C.H.;Jung, W.S.
    • Proceedings of the KSME Conference
    • /
    • 2008.11b
    • /
    • pp.1984-1989
    • /
    • 2008
  • In this study, fire simulations were performed to analyze the characteristics of the fire driven flow and the effects of the platform screen door on the smoke flow in the station, when the fire occurred in the center of the platform. Soongsil Univ. station (line number 7, 47m in depth underground) was chosen which was the one of the deepest underground subway stations in the Seoul metro, SMRT. The parallel computational method was employed to compute the heat and mass transfer eqn's with 6 CPUs of the linux clustering machine. The fire driven flow was simulated with using FDS code in which LES method was applied. The Heat release rate was 10MW and The Ultrafast model was applied for the growing model of the fire source. The 10,000,000 structured grids were used.

  • PDF