• Title/Summary/Keyword: similarity-based

Search Result 3,619, Processing Time 0.031 seconds

Development of Similarity-Based Document Clustering System (유사성 계수에 의한 문서 클러스터링 시스템 개발)

  • Woo Hoon-Shik;Yim Dong-Soon
    • Proceedings of the Society of Korea Industrial and System Engineering Conference
    • /
    • 2002.05a
    • /
    • pp.119-124
    • /
    • 2002
  • Clustering of data is of a great interest in many data mining applications. In the field of document clustering, a document is represented as a data in a high dimensional space. Therefore, the document clustering can be accomplished with a general data clustering techniques. In this paper, we introduce a document clustering system based on similarity among documents. The developed system consists of three functions: 1) gatherings documents utilizing a search agent; 2) determining similarity coefficients between any two documents from term frequencies; 3) clustering documents with similarity coefficients. Especially, the document clustering is accomplished by a hybrid algorithm utilizing genetic and K-Means methods.

  • PDF

A New Approach to Find Orthologous Proteins Using Sequence and Protein-Protein Interaction Similarity

  • Kim, Min-Kyung;Seol, Young-Joo;Park, Hyun-Seok;Jang, Seung-Hwan;Shin, Hang-Cheol;Cho, Kwang-Hwi
    • Genomics & Informatics
    • /
    • v.7 no.3
    • /
    • pp.141-147
    • /
    • 2009
  • Developed proteome-scale ortholog and paralog prediction methods are mainly based on sequence similarity. However, it is known that even the closest BLAST hit often does not mean the closest neighbor. For this reason, we added conserved interaction information to find orthologs. We propose a genome-scale, automated ortholog prediction method, named OrthoInterBlast. The method is based on both sequence and interaction similarity. When we applied this method to fly and yeast, 17% of the ortholog candidates were different compared with the results of Inparanoid. By adding protein-protein interaction information, proteins that have low sequence similarity still can be selected as orthologs, which can not be easily detected by sequence homology alone.

Semantic Mapping of Terms Based on Their Ontological Definitions and Similarities (온톨로지 기반의 용어 정의 비교 및 유사도를 고려한 의미 매핑)

  • Jung W.C.;Lee J.H.;Suh H.W.
    • Korean Journal of Computational Design and Engineering
    • /
    • v.11 no.3
    • /
    • pp.211-222
    • /
    • 2006
  • In collaborative environment, it is necessary that the participants in collaboration should share the same understanding about the semantics of terms. For example, they should know that 'COMPONENT' and 'ITEM' are different word-expressions for the same meaning. In order to handle such problems in information sharing, an information system needs to automatically recognize that the terms have the same semantics. So we develop an algorithm mapping two terms based on their ontological definitions and their similarities. The proposed algorithm consists of four steps: the character matching, the inferencing, the definition comparing and the similarity checking. In the similarity checking step, we consider relation similarity and hierarchical similarity. The algorithm is very primitive, but it shows the possibility of semi-automatic mapping using ontology. In addition, we design a mapping procedure for a mapping system, called SOM (semantic ontology mapper).

Using Fuzzy Rating Information for Collaborative Filtering-based Recommender Systems

  • Lee, Soojung
    • International journal of advanced smart convergence
    • /
    • v.9 no.3
    • /
    • pp.42-48
    • /
    • 2020
  • These days people are overwhelmed by information on the Internet thus searching for useful information becomes burdensome, often failing to acquire some in a reasonable time. Recommender systems are indispensable to fulfill such user needs through many practical commercial sites. This study proposes a novel similarity measure for user-based collaborative filtering which is a most popular technique for recommender systems. Compared to existing similarity measures, the main advantages of the suggested measure are that it takes all the ratings given by users into account for computing similarity, thus relieving the inherent data sparsity problem and that it reflects the uncertainty or vagueness of user ratings through fuzzy logic. Performance of the proposed measure is examined by conducting extensive experiments. It is found that it demonstrates superiority over previous relevant measures in terms of major quality metrics.

A bidirectional fuzy inference network for interval valued decision making systems (구간 결정값을 갖는 의사결정시스템의 양방향 퍼지 추론망)

  • 전명근
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.34C no.10
    • /
    • pp.98-105
    • /
    • 1997
  • In this work, we proesent a bidirectional approximate reasoning method and fuzzy inference network for interval valued decision making systems. For this, we propose a new type of similarity measure between two fuzzy vectors based on the Ordered Weighted Averaging (OWA) operator. Since the proposed similarity measure has a structure to give the extreme values by choosing a suitable weighting vector of the OWA operator, it can render an interval valued similarity value. From this property, we derive a bidirectional approximate reasoning method based on the similarity measure and show its fuzzy inference network implementation for the decision making systems requiring the interval valued decisions.

  • PDF

Development of A Web Mining System Based On Document Similarity (문서 유사도 기반의 웹 마이닝 시스템 개발)

  • 이강찬;민재홍;박기식;임동순;우훈식
    • The Journal of Society for e-Business Studies
    • /
    • v.7 no.1
    • /
    • pp.75-86
    • /
    • 2002
  • In this study, we proposed design issues and structure of a web mining system and develop a system for the purpose of knowledge integration under world wide web environments resulted from our developing experiences. The developed system consists of three main functions: 1) gathering documents utilizing a search agent; 2) determining similarity coefficients between any two documents from term frequencies; 3) clustering documents based on similarity coefficients. It is believed that the developed system can be utilized for discovery of knowledge in relatively narrow domains such as news classification, index term generation in knowledge management.

  • PDF

Bypass-Based Star Aggregation Using Link Attributes for Improving the Information Accuracy

  • Kwon, Sora;Jeon, Changho
    • Journal of Communications and Networks
    • /
    • v.17 no.4
    • /
    • pp.428-439
    • /
    • 2015
  • In this study, we present an approach for reducing the information inaccuracy of existing star aggregation based on bypass links when there are multi-constraint QoS parameters in asymmetric networks. In our approach, bypass links with low similarity are selected. Links that are not chosen as bypass links are included in each group depending on the star's link characteristics. Moreover, each link group is aggregated differently according to the similarity of the links that make up the group. The selection of a bypass link by using link similarity reduces the existing time complexity of O($N^3$) to O(N) by virtue of the simplification of the selection process. In addition, the adaptive integration according to the characteristics of the links in each group is designed to reduce the information inaccuracy caused by static aggregation. Simulation results show that the proposed method maintains low information distortion; specifically, it is 3.8 times lower than that of the existing method, even when the number of nodes in a network increases.

Viewpoint Unconstrained Face Recognition Based on Affine Local Descriptors and Probabilistic Similarity

  • Gao, Yongbin;Lee, Hyo Jong
    • Journal of Information Processing Systems
    • /
    • v.11 no.4
    • /
    • pp.643-654
    • /
    • 2015
  • Face recognition under controlled settings, such as limited viewpoint and illumination change, can achieve good performance nowadays. However, real world application for face recognition is still challenging. In this paper, we propose using the combination of Affine Scale Invariant Feature Transform (SIFT) and Probabilistic Similarity for face recognition under a large viewpoint change. Affine SIFT is an extension of SIFT algorithm to detect affine invariant local descriptors. Affine SIFT generates a series of different viewpoints using affine transformation. In this way, it allows for a viewpoint difference between the gallery face and probe face. However, the human face is not planar as it contains significant 3D depth. Affine SIFT does not work well for significant change in pose. To complement this, we combined it with probabilistic similarity, which gets the log likelihood between the probe and gallery face based on sum of squared difference (SSD) distribution in an offline learning process. Our experiment results show that our framework achieves impressive better recognition accuracy than other algorithms compared on the FERET database.

Study of the New Distance for Image Retrieval (새로운 이미지 거리를 통한 이미지 검색 방안 연구)

  • Lee, Sung Im;Lim, Jo Han;Cho, Young Min
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.40 no.4
    • /
    • pp.382-387
    • /
    • 2014
  • Image retrieval is a procedure to find images based on the resemblance between query image and all images. In retrieving images, the crucial step that arises is how to define the similarity between images. In this paper, we propose a new similarity measure which is based on distribution of color. We apply the new measure to retrieving two different types of images, wallpaper images and the logo of automobiles, and compare its performance to other existing similarity measures.

Web Page Similarity based on Size and Frequency of Tokens (토큰 크기 및 출현 빈도에 기반한 웹 페이지 유사도)

  • Lee, Eun-Joo;Jung, Woo-Sung
    • Journal of Information Technology Services
    • /
    • v.11 no.4
    • /
    • pp.263-275
    • /
    • 2012
  • It is becoming hard to maintain web applications because of high complexity and duplication of web pages. However, most of research about code clone is focusing on code hunks, and their target is limited to a specific language. Thus, we propose GSIM, a language-independent statistical approach to detect similar pages based on scarcity and frequency of customized tokens. The tokens, which can be obtained from pages splitted by a set of given separators, are defined as atomic elements for calculating similarity between two pages. In this paper, the domain definition for web applications and algorithms for collecting tokens, making matrics, calculating similarity are given. We also conducted experiments on open source codes for evaluation, with our GSIM tool. The results show the applicability of the proposed method and the effects of parameters such as threshold, toughness, length of tokens, on their quality and performance.