• Title/Summary/Keyword: Similarity Distance

Search Result 632, Processing Time 0.026 seconds

Korean Semantic Similarity Measures for the Vector Space Models

  • Lee, Young-In;Lee, Hyun-jung;Koo, Myoung-Wan;Cho, Sook Whan
    • Phonetics and Speech Sciences
    • /
    • v.7 no.4
    • /
    • pp.49-55
    • /
    • 2015
  • It is argued in this paper that, in determining semantic similarity, Korean words should be recategorized with a focus on the semantic relation to ontology in light of cross-linguistic morphological variations. It is proposed, in particular, that Korean semantic similarity should be measured on three tracks, human judgements track, relatedness track, and cross-part-of-speech relations track. As demonstrated in Yang et al. (2015), GloVe, the unsupervised learning machine on semantic similarity, is applicable to Korean with its performance being compared with human judgement results. Based on this compatability, it was further thought that the model's performance might most likely vary with different kinds of specific relations in different languages. An attempt was made to analyze them in terms of two major Korean-specific categories involved in their lexical and cross-POS-relations. It is concluded that languages must be analyzed by varying methods so that semantic components across languages may allow varying semantic distance in the vector space models.

An Incremental Similarity Computation Method in Agglomerative Hierarchical Clustering

  • Jung, Sung-young;Kim, Taek-soo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.11 no.7
    • /
    • pp.579-583
    • /
    • 2001
  • In the area of data clustering in high dimensional space, one of the difficulties is the time-consuming process for computing vector similarities. It becomes worse in the case of the agglomerative algorithm with the group-average link and mean centroid method, because the cluster similarity must be recomputed whenever the cluster center moves after the merging step. As a solution of this problem, we present an incremental method of similarity computation, which substitutes the scalar calculation for the time-consuming calculation of vector similarity with several measures such as the squared distance, inner product, cosine, and minimum variance. Experimental results show that it makes clustering speed significantly fast for very high dimensional data.

  • PDF

Comparison of Code Similarity Analysis Performance of funcGNN and Siamese Network (funcGNN과 Siamese Network의 코드 유사성 분석 성능비교)

  • Choi, Dong-Bin;Jo, In-su;Park, Young B.
    • Journal of the Semiconductor & Display Technology
    • /
    • v.20 no.3
    • /
    • pp.113-116
    • /
    • 2021
  • As artificial intelligence technologies, including deep learning, develop, these technologies are being introduced to code similarity analysis. In the traditional analysis method of calculating the graph edit distance (GED) after converting the source code into a control flow graph (CFG), there are studies that calculate the GED through a trained graph neural network (GNN) with the converted CFG, Methods for analyzing code similarity through CNN by imaging CFG are also being studied. In this paper, to determine which approach will be effective and efficient in researching code similarity analysis methods using artificial intelligence in the future, code similarity is measured through funcGNN, which measures code similarity using GNN, and Siamese Network, which is an image similarity analysis model. The accuracy was compared and analyzed. As a result of the analysis, the error rate (0.0458) of the Siamese network was bigger than that of the funcGNN (0.0362).

Protein Structure Alignment Based on Maximum of Residue Pair Distance and Similarity Graph (정렬된 잔기 사이의 최대거리와 유사도 그래프에 기반한 단백질 구조 정렬)

  • Kim, Woo-Cheol;Park, Sang-Hyun;Won, Jung-Im
    • Journal of KIISE:Databases
    • /
    • v.34 no.5
    • /
    • pp.396-408
    • /
    • 2007
  • After the Human Genome Project finished the sequencing of a human DNA sequence, the concerns on protein functions are increasing. Since the structures of proteins are conserved in divergent evolution, their functions are determined by their structures rather than by their amino acid sequences. Therefore, if similarities between two protein structures are observed, we could expect them to have common biological functions. So far, a lot of researches on protein structure alignment have been performed. However, most of them use RMSD(Root Mean Square Deviation) as a similarity measure with which it is hard to judge the similarity level of two protein structures intuitively. In addition, they retrieve only one result having the highest alignment score with which it is hard to satisfy various users of different purpose. To overcome these limitations, we propose a novel protein structure alignment algorithm based on MRPD(Maximum of Residue Pair Distance) and SG (Similarity Graph). MRPD is more intuitive similarity measure by which fast tittering of unpromising pairs of protein pairs is possible, and SG is a compact representation method for multiple alignment results with which users can choose the most plausible one among various users' needs by providing multiple alignment results without compromising the time to align protein structures.

MRI Image Retrieval Using Wavelet with Mahalanobis Distance Measurement

  • Rajakumar, K.;Muttan, S.
    • Journal of Electrical Engineering and Technology
    • /
    • v.8 no.5
    • /
    • pp.1188-1193
    • /
    • 2013
  • In content based image retrieval (CBIR) system, the images are represented based upon its feature such as color, texture, shape, and spatial relationship etc. In this paper, we propose a MRI Image Retrieval using wavelet transform with mahalanobis distance measurement. Wavelet transformation can also be easily extended to 2-D (image) or 3-D (volume) data by successively applying 1-D transformation on different dimensions. The proposed algorithm has tested using wavelet transform and performance analysis have done with HH and $H^*$ elimination methods. The retrieval image is the relevance between a query image and any database image, the relevance similarity is ranked according to the closest similar measures computed by the mahalanobis distance measurement. An adaptive similarity synthesis approach based on a linear combination of individual feature level similarities are analyzed and presented in this paper. The feature weights are calculated by considering both the precision and recall rate of the top retrieved relevant images as predicted by our enhanced technique. Hence, to produce effective results the weights are dynamically updated for robust searching process. The experimental results show that the proposed algorithm is easily identifies target object and reduces the influence of background in the image and thus improves the performance of MRI image retrieval.

3D Shape Descriptor with Interatomic Distance for Screening the Molecular Database (분자 데이터베이스 스크리닝을 위한 원자간 거리 기반의 3차원 형상 기술자)

  • Lee, Jae-Ho;Park, Joon-Young
    • Korean Journal of Computational Design and Engineering
    • /
    • v.14 no.6
    • /
    • pp.404-414
    • /
    • 2009
  • In the computational molecular analysis, 3D structural comparison for protein searching plays a very important role. As protein databases have been grown rapidly in size, exhaustive search methods cannot provide satisfactory performance. Because exhaustive search methods try to handle the structure of protein by using sphere set which is converted from atoms set, the similarity calculation about two sphere sets is very expensive. Instead, the filter-and-refine paradigm offers an efficient alternative to database search without compromising the accuracy of the answers. In recent, a very fast algorithm based on the inter-atomic distance has been suggested by Ballester and Richard. Since they adopted the moments of distribution with inter-atomic distance between atoms which are rotational invariant, they can eliminate the structure alignment and orientation fix process and perform the searching faster than previous methods. In this paper, we propose a new 3D shape descriptor. It has properties of the general shape distribution and useful property in screening the molecular database. We show some experimental results for the validity of our method.

Application of Euclidean Distance Similarity for Smartphone-Based Moving Context Determination (스마트폰 기반의 이동상황 판별을 위한 유클리디안 거리유사도의 응용)

  • Jang, Young-Wan;Kim, Byeong Man;Jang, Sung Bong;Shin, Yoon Sik
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.19 no.4
    • /
    • pp.53-63
    • /
    • 2014
  • Moving context determination is an important issue to be resolved in a mobile computing environment. This paper presents a method for recognizing and classifying a mobile user's moving context by Euclidean distance similarity. In the proposed method, basic data are gathered using Global Positioning System (GPS) and accelerometer sensors, and by using the data, the system decides which moving situation the user is in. The decided situation is one of the four categories: stop, walking, run, and moved by a car. In order to evaluate the effectiveness and feasibility of the proposed scheme, we have implemented applications using several variations of Euclidean distance similarity on the Android system, and measured the accuracies. Experimental results show that the proposed system achieves more than 90% accuracy.

Word Similarity Calculation by Using the Edit Distance Metrics with Consonant Normalization

  • Kang, Seung-Shik
    • Journal of Information Processing Systems
    • /
    • v.11 no.4
    • /
    • pp.573-582
    • /
    • 2015
  • Edit distance metrics are widely used for many applications such as string comparison and spelling error corrections. Hamming distance is a metric for two equal length strings and Damerau-Levenshtein distance is a well-known metrics for making spelling corrections through string-to-string comparison. Previous distance metrics seems to be appropriate for alphabetic languages like English and European languages. However, the conventional edit distance criterion is not the best method for agglutinative languages like Korean. The reason is that two or more letter units make a Korean character, which is called as a syllable. This mechanism of syllable-based word construction in the Korean language causes an edit distance calculation to be inefficient. As such, we have explored a new edit distance method by using consonant normalization and the normalization factor.

Analysis of Fuzzy Entropy and Similarity Measure for Non Convex Membership Functions

  • Lee, Sang-H.;Kim, Sang-Jin
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.9 no.1
    • /
    • pp.4-9
    • /
    • 2009
  • Fuzzy entropy is designed for non convex fuzzy membership function using well known Hamming distance measure. Design procedure of convex fuzzy membership function is represented through distance measure, furthermore characteristic analysis for non convex function are also illustrated. Proof of proposed fuzzy entropy is discussed, and entropy computation is illustrated.

A New Semantic Distance Measurement Method using TF-IDF in Linked Open Data (링크드 오픈 데이터에서 TF-IDF를 이용한 새로운 시맨틱 거리 측정 기법)

  • Cho, Jung-Gil
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.10
    • /
    • pp.89-96
    • /
    • 2020
  • Linked Data allows structured data to be published in a standard way that datasets from various domains can be interlinked. With the rapid evolution of Linked Open Data(LOD), researchers are exploiting it to solve particular problems such as semantic similarity assessment. In this paper, we propose a method, on top of the basic concept of Linked Data Semantic Distance (LDSD), for calculating the Linked Data semantic distance between resources that can be used in the LOD-based recommender system. The semantic distance measurement model proposed in this paper is based on a similarity measurement that combines the LOD-based semantic distance and a new link weight using TF-IDF, which is well known in the field of information retrieval. In order to verify the effectiveness of this paper's approach, performance was evaluated in the context of an LOD-based recommendation system using mixed data of DBpedia and MovieLens. Experimental results show that the proposed method shows higher accuracy compared to other similar methods. In addition, it contributed to the improvement of the accuracy of the recommender system by expanding the range of semantic distance calculation.