• Title/Summary/Keyword: Similarity measures

Search Result 304, Processing Time 0.03 seconds

A study of a image segmentation by the normalized cut (Normalized cut을 이용한 Image segmentation에 대한 연구)

  • Lee, Kyu-Han;Chung, Chin-Hyun
    • Proceedings of the KIEE Conference
    • /
    • 1998.07g
    • /
    • pp.2243-2245
    • /
    • 1998
  • In this paper, we treat image segmentation as a graph partitioning problem. and use the normalized cut for segmenting the graph. The normalized cut criterion measures both the total dissimilarity between the different graphs as well as the total similarity within the groups. The minimization of this criterion can formulated as a generalized eigenvalues problem. We have applied this approach to segment static image. This criterion can be shown to be computed efficiently by a generalized eigenvalues problem

  • PDF

An Object-Oriented Case-Base Design and Similarity Measures for Bundle Products Recommendation Systems (번들상품추천시스템 개발을 위한 객체지향 사례베이스 설계와 유사도 측정에 관한 연구)

  • 정대율
    • Journal of Intelligence and Information Systems
    • /
    • v.9 no.1
    • /
    • pp.23-51
    • /
    • 2003
  • With the recent expansion of internet shopping mall, the importance of intelligent products recommendation agents has been increasing. for the products recommendation, This paper propose case-based reasoning approach, and developed a case-based bundle products recommendation system which can recommend a set of sea food used in family events. To apply CBR approach to the bundle products recommendation, it requires the following 4R steps : \circled1 Retrieval, \circled2 Reuse, \circled3 Revise, \circled4 Retain. To retrieve similar cases from the case-base efficiently, case representation scheme is most important. This paper used OW(Object Modeling Technique) to represent bundle products recommendation cases, and developed a similarity measure method to search similar cases. To measure similarity, we used weight-sum approach basically. Especially This paper propose the meaning and uses of taxonomies for representing case features.

  • PDF

Detecting Software Similarity Using API Sequences on Static Major Paths (정적 주요 경로 API 시퀀스를 이용한 소프트웨어 유사성 검사)

  • Park, Seongsoo;Han, Hwansoo
    • Journal of KIISE
    • /
    • v.41 no.12
    • /
    • pp.1007-1012
    • /
    • 2014
  • Software birthmarks are used to detect software plagiarism. For binaries, however, only a few birthmarks have been developed. In this paper, we propose a static approach to generate API sequences along major paths, which are analyzed from control flow graphs of the binaries. Since our API sequences are extracted along the most plausible paths of the binary codes, they can represent actual API sequences produced from binary executions, but in a more concise form. Our similarity measures use the Smith-Waterman algorithm that is one of the popular sequence alignment algorithms for DNA sequence analysis. We evaluate our static path-based API sequence with multiple versions of five applications. Our experiment indicates that our proposed method provides a quite reliable similarity birthmark for binaries.

An Experimental Study on Feature Selection Using Wikipedia for Text Categorization (위키피디아를 이용한 분류자질 선정에 관한 연구)

  • Kim, Yong-Hwan;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.2
    • /
    • pp.155-171
    • /
    • 2012
  • In text categorization, core terms of an input document are hardly selected as classification features if they do not occur in a training document set. Besides, synonymous terms with the same concept are usually treated as different features. This study aims to improve text categorization performance by integrating synonyms into a single feature and by replacing input terms not in the training document set with the most similar term occurring in training documents using Wikipedia. For the selection of classification features, experiments were performed in various settings composed of three different conditions: the use of category information of non-training terms, the part of Wikipedia used for measuring term-term similarity, and the type of similarity measures. The categorization performance of a kNN classifier was improved by 0.35~1.85% in $F_1$ value in all the experimental settings when non-learning terms were replaced by the learning term with the highest similarity above the threshold value. Although the improvement ratio is not as high as expected, several semantic as well as structural devices of Wikipedia could be used for selecting more effective classification features.

Evaluation of shape similarity for 3D models (3차원 모델을 위한 형상 유사성 평가)

  • Kim, Jeong-Sik;Choi, Soo-Mi
    • The KIPS Transactions:PartA
    • /
    • v.10A no.4
    • /
    • pp.357-368
    • /
    • 2003
  • Evaluation of shape similarity for 3D models is essential in many areas - medicine, mechanical engineering, molecular biology, etc. Moreover, as 3D models are commonly used on the Web, many researches have been made on the classification and retrieval of 3D models. In this paper, we describe methods for 3D shape representation and major concepts of similarity evaluation, and analyze the key features of recent researches for shape comparison after classifying them into four categories including multi-resolution, topology, 2D image, and statistics based methods. In addition, we evaluated the performance of the reviewed methods by the selected criteria such as uniqueness, robustness, invariance, multi-resolution, efficiency, and comparison scope. Multi-resolution based methods have resulted in decreased computation time for comparison and increased preprocessing time. The methods using geometric and topological information were able to compare more various types of models and were robust to partial shape comparison. 2D image based methods incurred overheads in time and space complexity. Statistics based methods allowed for shape comparison without pose-normalization and showed robustness against affine transformations and noise.

An Index-Based Search Method for Performance Improvement of Set-Based Similar Sequence Matching (집합 유사 시퀀스 매칭의 성능 향상을 위한 인덱스 기반 검색 방법)

  • Lee, Juwon;Lim, Hyo-Sang
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.11
    • /
    • pp.507-520
    • /
    • 2017
  • The set-based similar sequence matching method measures similarity not for an individual data item but for a set grouping multiple data items. In the method, the similarity of two sets is represented as the size of intersection between them. However, there is a critical performances issue for the method in twofold: 1) calculating intersection size is a time consuming process, and 2) the number of set pairs that should be calculated the intersection size is quite large. In this paper, we propose an index-based search method for improving performance of set-based similar sequence matching in order to solve these performance issues. Our method consists of two parts. In the first part, we convert the set similarity problem into the intersection size comparison problem, and then, provide an index structure that accelerates the intersection size calculation. Second, we propose an efficient set-based similar sequence matching method which exploits the proposed index structure. Through experiments, we show that the proposed method reduces the execution time by 30 to 50 times then the existing methods. We also show that the proposed method has scalability since the performance gap becomes larger as the number of data sequences increases.

A New Statistical Index for Detecting Cheaters on Multiple Choice Tests (다중선택 시험에서 부정행위자 발견을 위한 새로운 통계적 측도)

  • Han, Eun Su;Lim, Johan;Lee, Kyeong Eun
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.1
    • /
    • pp.81-92
    • /
    • 2013
  • It is important to construct a firm basis for accusing potential violators of academic integrity in order to avoid spurious accusations and false convictions. Educational researchers have developed many statistical methods that can either uncover or confirm cases of cheating on tests. However, most of them rely on simple correlation-based measures, and often fail to account for patterns in responses or answers. In this paper, we propose a new statistical index denoted by a Standardized Signed Entropy Similarity Score to resolve this difficulty. In addition, we apply the proposed method to analyze a real data set and compare the results to other existing methods.

Implementation of an Efficient Requirements Analysis supporting System using Similarity Measure Techniques (유사도 측정 기법을 이용한 효율적인 요구 분석 지원 시스템의 구현)

  • Kim, Hark-Soo;Ko, Young-Joong;Park, Soo-Yong;Seo, Jung-Yun
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.1
    • /
    • pp.13-23
    • /
    • 2000
  • As software becomes more complicated and large-scaled, user's demands become more varied and his expectation levels about software products are raised. Therefore it is very important that a software engineer analyzes user's requirements precisely and applies it effectively in the development step. This paper presents a requirements analysis system that reduces and revises errors of requirements specifications analysis effectively. As this system measures the similarity among requirements documents and sentences, it assists users in analyzing the dependency among requirements specifications and finding the traceability, redundancy, inconsistency and incompleteness among requirements sentences. It also extracts sentences that contain ambiguous words. Indexing method for the similarity measurement combines sliding window model and dependency structure model. This method can complement each model's weeknesses. This paper verifies the efficiency of similarity measure techniques through experiments and presents a proccess of the requirements specifications analysis using the embodied system.

  • PDF

Semantic Conceptual Relational Similarity Based Web Document Clustering for Efficient Information Retrieval Using Semantic Ontology

  • Selvalakshmi, B;Subramaniam, M;Sathiyasekar, K
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.9
    • /
    • pp.3102-3119
    • /
    • 2021
  • In the modern rapid growing web era, the scope of web publication is about accessing the web resources. Due to the increased size of web, the search engines face many challenges, in indexing the web pages as well as producing result to the user query. Methodologies discussed in literatures towards clustering web documents suffer in producing higher clustering accuracy. Problem is mitigated using, the proposed scheme, Semantic Conceptual Relational Similarity (SCRS) based clustering algorithm which, considers the relationship of any document in two ways, to measure the similarity. One is with the number of semantic relations of any document class covered by the input document and the second is the number of conceptual relation the input document covers towards any document class. With a given data set Ds, the method estimates the SCRS measure for each document Di towards available class of documents. As a result, a class with maximum SCRS is identified and the document is indexed on the selected class. The SCRS measure is measured according to the semantic relevancy of input document towards each document of any class. Similarly, the input query has been measured for Query Relational Semantic Score (QRSS) towards each class of documents. Based on the value of QRSS measure, the document class is identified, retrieved and ranked based on the QRSS measure to produce final population. In both the way, the semantic measures are estimated based on the concepts available in semantic ontology. The proposed method had risen efficient result in indexing as well as search efficiency also has been improved.

Categorizing accident sequences in the external radiotherapy for risk analysis

  • Kim, Jonghyun
    • Radiation Oncology Journal
    • /
    • v.31 no.2
    • /
    • pp.88-96
    • /
    • 2013
  • Purpose: This study identifies accident sequences from the past accidents in order to help the risk analysis application to the external radiotherapy. Materials and Methods: This study reviews 59 accidental cases in two retrospective safety analyses that have collected the incidents in the external radiotherapy extensively. Two accident analysis reports that accumulated past incidents are investigated to identify accident sequences including initiating events, failure of safety measures, and consequences. This study classifies the accidents by the treatments stages and sources of errors for initiating events, types of failures in the safety measures, and types of undesirable consequences and the number of affected patients. Then, the accident sequences are grouped into several categories on the basis of similarity of progression. As a result, these cases can be categorized into 14 groups of accident sequence. Results: The result indicates that risk analysis needs to pay attention to not only the planning stage, but also the calibration stage that is committed prior to the main treatment process. It also shows that human error is the largest contributor to initiating events as well as to the failure of safety measures. This study also illustrates an event tree analysis for an accident sequence initiated in the calibration. Conclusion: This study is expected to provide sights into the accident sequences for the prospective risk analysis through the review of experiences.