• Title/Summary/Keyword: binary level similarity

Search Result 9, Processing Time 0.027 seconds

Cross-architecture Binary Function Similarity Detection based on Composite Feature Model

  • Xiaonan Li;Guimin Zhang;Qingbao Li;Ping Zhang;Zhifeng Chen;Jinjin Liu;Shudan Yue
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.8
    • /
    • pp.2101-2123
    • /
    • 2023
  • Recent studies have shown that the neural network-based binary code similarity detection technology performs well in vulnerability mining, plagiarism detection, and malicious code analysis. However, existing cross-architecture methods still suffer from insufficient feature characterization and low discrimination accuracy. To address these issues, this paper proposes a cross-architecture binary function similarity detection method based on composite feature model (SDCFM). Firstly, the binary function is converted into vector representation according to the proposed composite feature model, which is composed of instruction statistical features, control flow graph structural features, and application program interface calling behavioral features. Then, the composite features are embedded by the proposed hierarchical embedding network based on a graph neural network. In which, the block-level features and the function-level features are processed separately and finally fused into the embedding. In addition, to make the trained model more accurate and stable, our method utilizes the embeddings of predecessor nodes to modify the node embedding in the iterative updating process of the graph neural network. To assess the effectiveness of composite feature model, we contrast SDCFM with the state of art method on benchmark datasets. The experimental results show that SDCFM has good performance both on the area under the curve in the binary function similarity detection task and the vulnerable candidate function ranking in vulnerability search task.

Cluster Analysis with Balancing Weight on Mixed-type Data

  • Chae, Seong-San;Kim, Jong-Min;Yang, Wan-Youn
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.3
    • /
    • pp.719-732
    • /
    • 2006
  • A set of clustering algorithms with proper weight on the formulation of distance which extend to mixed numeric and multiple binary values is presented. A simple matching and Jaccard coefficients are used to measure similarity between objects for multiple binary attributes. Similarities are converted to dissimilarities between i th and j th objects. The performance of clustering algorithms with balancing weight on different similarity measures is demonstrated. Our experiments show that clustering algorithms with application of proper weight give competitive recovery level when a set of data with mixed numeric and multiple binary attributes is clustered.

Detecting Software Similarity Using API Sequences on Static Major Paths (정적 주요 경로 API 시퀀스를 이용한 소프트웨어 유사성 검사)

  • Park, Seongsoo;Han, Hwansoo
    • Journal of KIISE
    • /
    • v.41 no.12
    • /
    • pp.1007-1012
    • /
    • 2014
  • Software birthmarks are used to detect software plagiarism. For binaries, however, only a few birthmarks have been developed. In this paper, we propose a static approach to generate API sequences along major paths, which are analyzed from control flow graphs of the binaries. Since our API sequences are extracted along the most plausible paths of the binary codes, they can represent actual API sequences produced from binary executions, but in a more concise form. Our similarity measures use the Smith-Waterman algorithm that is one of the popular sequence alignment algorithms for DNA sequence analysis. We evaluate our static path-based API sequence with multiple versions of five applications. Our experiment indicates that our proposed method provides a quite reliable similarity birthmark for binaries.

Detection of an Open-Source Software Module based on Function-level Features (함수 수준 특징정보 기반의 오픈소스 소프트웨어 모듈 탐지)

  • Kim, Dongjin;Cho, Seong-je
    • Journal of KIISE
    • /
    • v.42 no.6
    • /
    • pp.713-722
    • /
    • 2015
  • As open-source software (OSS) becomes more widely used, many users breach the terms in the license agreement of OSS, or reuse a vulnerable OSS module. Therefore, a technique needs to be developed for investigating if a binary program includes an OSS module. In this paper, we propose an efficient technique to detect a particular OSS module in an executable program using its function-level features. The conventional methods are inappropriate for determining whether a module is contained in a specific program because they usually measure the similarity between whole programs. Our technique determines whether an executable program contains a certain OSS module by extracting features such as its function-level instructions, control flow graph, and the structural attributes of a function from both the program and the module, and comparing the similarity of features. In order to demonstrate the efficiency of the proposed technique, we evaluate it in terms of the size of features, detection accuracy, execution overhead, and resilience to compiler optimizations.

Software Similarity Detection Using Highly Credible Dynamic API Sequences (신뢰성 높은 동적 API 시퀀스를 이용한 소프트웨어 유사성 검사)

  • Park, Seongsoo;Han, Hwansoo
    • Journal of KIISE
    • /
    • v.43 no.10
    • /
    • pp.1067-1072
    • /
    • 2016
  • Software birthmarks, which are unique characteristics of the software, are used to detect software plagiarism or software similarity. Generally, software birthmarks are divided into static birthmarks or dynamic birthmarks, which have evident pros and cons depending on the extraction method. In this paper, we propose a method for extracting the API sequence birthmarks using a dynamic analysis and similarity detection between the executable codes. Dynamic birthmarks based on API sequences extract API functions during the execution of programs. The extracted API sequences often include all the API functions called from the start to the end of the program. Meanwhile, our dynamic birthmark scheme extracts the API functions only called directly from the executable code. Then, it uses a sequence alignment algorithm to calculate the similarity metric effectively. We evaluate the birthmark with several open source software programs to verify its reliability and credibility. Our dynamic birthmark scheme based on the extracted API sequence can be utilized in a similarity test of executable codes.

Authorship Attribution Framework Using Survival Network Concept : Semantic Features and Tolerances (서바이벌 네트워크 개념을 이용한 저자 식별 프레임워크: 의미론적 특징과 특징 허용 범위)

  • Hwang, Cheol-Hun;Shin, Gun-Yoon;Kim, Dong-Wook;Han, Myung-Mook
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.6
    • /
    • pp.1013-1021
    • /
    • 2020
  • Malware Authorship Attribution is a research field for identifying malware by comparing the author characteristics of unknown malware with the characteristics of known malware authors. The authorship attribution method using binaries has the advantage that it is easy to collect and analyze targeted malicious codes, but the scope of using features is limited compared to the method using source code. This limitation has the disadvantage that accuracy decreases for a large number of authors. This study proposes a method of 'Defining semantic features from binaries' and 'Defining allowable ranges for redundant features using the concept of survival network' to complement the limitations in the identification of binary authors. The proposed method defines Opcode-based graph features from binary information, and defines the allowable range for selecting unique features for each author using the concept of a survival network. Through this, it was possible to define the feature definition and feature selection method for each author as a single technology, and through the experiment, it was confirmed that it was possible to derive the same level of accuracy as the source code-based analysis with an improvement of 5.0% accuracy compared to the previous study.

Local Shape Analysis of the Hippocampus using Hierarchical Level-of-Detail Representations (계층적 Level-of-Detail 표현을 이용한 해마의 국부적인 형상 분석)

  • Kim Jeong-Sik;Choi Soo-Mi;Choi Yoo-Ju;Kim Myoung-Hee
    • The KIPS Transactions:PartA
    • /
    • v.11A no.7 s.91
    • /
    • pp.555-562
    • /
    • 2004
  • Both global volume reduction and local shape changes of hippocampus within the brain indicate their abnormal neurological states. Hippocampal shape analysis consists of two main steps. First, construct a hippocampal shape representation model ; second, compute a shape similarity from this representation. This paper proposes a novel method for the analysis of hippocampal shape using integrated Octree-based representation, containing meshes, voxels, and skeletons. First of all, we create multi-level meshes by applying the Marching Cube algorithm to the hippocampal region segmented from MR images. This model is converted to intermediate binary voxel representation. And we extract the 3D skeleton from these voxels using the slice-based skeletonization method. Then, in order to acquire multiresolutional shape representation, we store hierarchically the meshes, voxels, skeletons comprised in nodes of the Octree, and we extract the sample meshes using the ray-tracing based mesh sampling technique. Finally, as a similarity measure between the shapes, we compute $L_2$ Norm and Hausdorff distance for each sam-pled mesh pair by shooting the rays fired from the extracted skeleton. As we use a mouse picking interface for analyzing a local shape inter-actively, we provide an interaction and multiresolution based analysis for the local shape changes. In this paper, our experiment shows that our approach is robust to the rotation and the scale, especially effective to discriminate the changes between local shapes of hippocampus and more-over to increase the speed of analysis without degrading accuracy by using a hierarchical level-of-detail approach.

A study on image region analysis and image enhancement using detail descriptor (디테일 디스크립터를 이용한 이미지 영역 분석과 개선에 관한 연구)

  • Lim, Jae Sung;Jeong, Young-Tak;Lee, Ji-Hyeok
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.6
    • /
    • pp.728-735
    • /
    • 2017
  • With the proliferation of digital devices, the devices have generated considerable additive white Gaussian noise while acquiring digital images. The most well-known denoising methods focused on eliminating the noise, so detailed components that include image information were removed proportionally while eliminating the image noise. The proposed algorithm provides a method that preserves the details and effectively removes the noise. In this proposed method, the goal is to separate meaningful detail information in image noise environment using the edge strength and edge connectivity. Consequently, even as the noise level increases, it shows denoising results better than the other benchmark methods because proposed method extracts the connected detail component information. In addition, the proposed method effectively eliminated the noise for various noise levels; compared to the benchmark algorithms, the proposed algorithm shows a highly structural similarity index(SSIM) value and peak signal-to-noise ratio(PSNR) value, respectively. As shown the result of high SSIMs, it was confirmed that the SSIMs of the denoising results includes a human visual system(HVS).

A Study on the Effect of Network Centralities on Recommendation Performance (네트워크 중심성 척도가 추천 성능에 미치는 영향에 대한 연구)

  • Lee, Dongwon
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.23-46
    • /
    • 2021
  • Collaborative filtering, which is often used in personalization recommendations, is recognized as a very useful technique to find similar customers and recommend products to them based on their purchase history. However, the traditional collaborative filtering technique has raised the question of having difficulty calculating the similarity for new customers or products due to the method of calculating similaritiesbased on direct connections and common features among customers. For this reason, a hybrid technique was designed to use content-based filtering techniques together. On the one hand, efforts have been made to solve these problems by applying the structural characteristics of social networks. This applies a method of indirectly calculating similarities through their similar customers placed between them. This means creating a customer's network based on purchasing data and calculating the similarity between the two based on the features of the network that indirectly connects the two customers within this network. Such similarity can be used as a measure to predict whether the target customer accepts recommendations. The centrality metrics of networks can be utilized for the calculation of these similarities. Different centrality metrics have important implications in that they may have different effects on recommended performance. In this study, furthermore, the effect of these centrality metrics on the performance of recommendation may vary depending on recommender algorithms. In addition, recommendation techniques using network analysis can be expected to contribute to increasing recommendation performance even if they apply not only to new customers or products but also to entire customers or products. By considering a customer's purchase of an item as a link generated between the customer and the item on the network, the prediction of user acceptance of recommendation is solved as a prediction of whether a new link will be created between them. As the classification models fit the purpose of solving the binary problem of whether the link is engaged or not, decision tree, k-nearest neighbors (KNN), logistic regression, artificial neural network, and support vector machine (SVM) are selected in the research. The data for performance evaluation used order data collected from an online shopping mall over four years and two months. Among them, the previous three years and eight months constitute social networks composed of and the experiment was conducted by organizing the data collected into the social network. The next four months' records were used to train and evaluate recommender models. Experiments with the centrality metrics applied to each model show that the recommendation acceptance rates of the centrality metrics are different for each algorithm at a meaningful level. In this work, we analyzed only four commonly used centrality metrics: degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality. Eigenvector centrality records the lowest performance in all models except support vector machines. Closeness centrality and betweenness centrality show similar performance across all models. Degree centrality ranking moderate across overall models while betweenness centrality always ranking higher than degree centrality. Finally, closeness centrality is characterized by distinct differences in performance according to the model. It ranks first in logistic regression, artificial neural network, and decision tree withnumerically high performance. However, it only records very low rankings in support vector machine and K-neighborhood with low-performance levels. As the experiment results reveal, in a classification model, network centrality metrics over a subnetwork that connects the two nodes can effectively predict the connectivity between two nodes in a social network. Furthermore, each metric has a different performance depending on the classification model type. This result implies that choosing appropriate metrics for each algorithm can lead to achieving higher recommendation performance. In general, betweenness centrality can guarantee a high level of performance in any model. It would be possible to consider the introduction of proximity centrality to obtain higher performance for certain models.