• Title/Summary/Keyword: Similarity measures

Search Result 304, Processing Time 0.027 seconds

Study of the New Distance for Image Retrieval (새로운 이미지 거리를 통한 이미지 검색 방안 연구)

  • Lee, Sung Im;Lim, Jo Han;Cho, Young Min
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.40 no.4
    • /
    • pp.382-387
    • /
    • 2014
  • Image retrieval is a procedure to find images based on the resemblance between query image and all images. In retrieving images, the crucial step that arises is how to define the similarity between images. In this paper, we propose a new similarity measure which is based on distribution of color. We apply the new measure to retrieving two different types of images, wallpaper images and the logo of automobiles, and compare its performance to other existing similarity measures.

Statistical Fingerprint Recognition Matching Method with an Optimal Threshold and Confidence Interval

  • Hong, C.S.;Kim, C.H.
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.6
    • /
    • pp.1027-1036
    • /
    • 2012
  • Among various biometrics recognition systems, statistical fingerprint recognition matching methods are considered using minutiae on fingerprints. We define similarity distance measures based on the coordinate and angle of the minutiae, and suggest a fingerprint recognition model following statistical distributions. We could obtain confidence intervals of similarity distance for the same and different persons, and optimal thresholds to minimize two kinds of error rates for distance distributions. It is found that the two confidence intervals of the same and different persons are not overlapped and that the optimal threshold locates between two confidence intervals. Hence an alternative statistical matching method can be suggested by using nonoverlapped confidence intervals and optimal thresholds obtained from the distributions of similarity distances.

Evaluation of Similarity Analysis of Newspaper Article Using Natural Language Processing

  • Ayako Ohshiro;Takeo Okazaki;Takashi Kano;Shinichiro Ueda
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.6
    • /
    • pp.1-7
    • /
    • 2024
  • Comparing text features involves evaluating the "similarity" between texts. It is crucial to use appropriate similarity measures when comparing similarities. This study utilized various techniques to assess the similarities between newspaper articles, including deep learning and a previously proposed method: a combination of Pointwise Mutual Information (PMI) and Word Pair Matching (WPM), denoted as PMI+WPM. For performance comparison, law data from medical research in Japan were utilized as validation data in evaluating the PMI+WPM method. The distribution of similarities in text data varies depending on the evaluation technique and genre, as revealed by the comparative analysis. For newspaper data, non-deep learning methods demonstrated better similarity evaluation accuracy than deep learning methods. Additionally, evaluating similarities in law data is more challenging than in newspaper articles. Despite deep learning being the prevalent method for evaluating textual similarities, this study demonstrates that non-deep learning methods can be effective regarding Japanese-based texts.

An Experimental Study on Selecting Association Terms Using Text Mining Techniques (텍스트 마이닝 기법을 이용한 연관용어 선정에 관한 실험적 연구)

  • Kim, Su-Yeon;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.23 no.3 s.61
    • /
    • pp.147-165
    • /
    • 2006
  • In this study, experiments for selection of association terms were conducted in order to discover the optimum method in selecting additional terms that are related to an initial query term. Association term sets were generated by using support, confidence, and lift measures of the Apriori algorithm, and also by using the similarity measures such as GSS, Jaccard coefficient, cosine coefficient, and Sokal & Sneath 5, and mutual information. In performance evaluation of term selection methods, precision of association terms as well as the overlap ratio of association terms and relevant documents' indexing terms were used. It was found that Apriori algorithm and GSS achieved the highest level of performances.

Gated Recurrent Unit Architecture for Context-Aware Recommendations with improved Similarity Measures

  • Kala, K.U.;Nandhini, M.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.2
    • /
    • pp.538-561
    • /
    • 2020
  • Recommender Systems (RecSys) have a major role in e-commerce for recommending products, which they may like for every user and thus improve their business aspects. Although many types of RecSyss are there in the research field, the state of the art RecSys has focused on finding the user similarity based on sequence (e.g. purchase history, movie-watching history) analyzing and prediction techniques like Recurrent Neural Network in Deep learning. That is RecSys has considered as a sequence prediction problem. However, evaluation of similarities among the customers is challenging while considering temporal aspects, context and multi-component ratings of the item-records in the customer sequences. For addressing this issue, we are proposing a Deep Learning based model which learns customer similarity directly from the sequence to sequence similarity as well as item to item similarity by considering all features of the item, contexts, and rating components using Dynamic Temporal Warping(DTW) distance measure for dynamic temporal matching and 2D-GRU (Two Dimensional-Gated Recurrent Unit) architecture. This will overcome the limitation of non-linearity in the time dimension while measuring the similarity, and the find patterns more accurately and speedily from temporal and spatial contexts. Experiment on the real world movie data set LDOS-CoMoDa demonstrates the efficacy and promising utility of the proposed personalized RecSys architecture.

A Similarity Ranking Algorithm for Image Databases (이미지 데이터베이스 유사도 순위 매김 알고리즘)

  • Cha, Guang-Ho
    • Journal of KIISE:Databases
    • /
    • v.36 no.5
    • /
    • pp.366-373
    • /
    • 2009
  • In this paper, we propose a similarity search algorithm for image databases. One of the central problems regarding content-based image retrieval (CBIR) is the semantic gap between the low-level features computed automatically from images and the human interpretation of image content. Many search algorithms used in CBIR have used the Minkowski metric (or $L_p$-norm) to measure similarity between image pairs. However those functions cannot adequately capture the aspects of the characteristics of the human visual system as well as the nonlinear relationships in contextual information. Our new search algorithm tackles this problem by employing new similarity measures and ranking strategies that reflect the nonlinearity of human perception and contextual information. Our search algorithm yields superior experimental results on a real handwritten digit image database and demonstrates its effectiveness.

Patent Document Similarity Based on Image Analysis Using the SIFT-Algorithm and OCR-Text

  • Park, Jeong Beom;Mandl, Thomas;Kim, Do Wan
    • International Journal of Contents
    • /
    • v.13 no.4
    • /
    • pp.70-79
    • /
    • 2017
  • Images are an important element in patents and many experts use images to analyze a patent or to check differences between patents. However, there is little research on image analysis for patents partly because image processing is an advanced technology and typically patent images consist of visual parts as well as of text and numbers. This study suggests two methods for using image processing; the Scale Invariant Feature Transform(SIFT) algorithm and Optical Character Recognition(OCR). The first method which works with SIFT uses image feature points. Through feature matching, it can be applied to calculate the similarity between documents containing these images. And in the second method, OCR is used to extract text from the images. By using numbers which are extracted from an image, it is possible to extract the corresponding related text within the text passages. Subsequently, document similarity can be calculated based on the extracted text. Through comparing the suggested methods and an existing method based only on text for calculating the similarity, the feasibility is achieved. Additionally, the correlation between both the similarity measures is low which shows that they capture different aspects of the patent content.

Similarity Measure Between Interval-valued Vague Sets (구간값 모호집합 사이의 유사척도)

  • Cho, Sang-Yeop
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.5
    • /
    • pp.603-608
    • /
    • 2009
  • In this paper, a similarity measure between interval-valued vague sets is proposed. In the interval-valued vague sets representation, the upper bound and the lower bound of a vague set are represented as intervals of interval-valued fuzzy set respectively. Proposed method combines the concept of geometric distance and the center-of-gravity point of interval-valued vague set to evaluate the degree of similarity between interval-valued vague sets. We also prove three properties of the proposed similarity measure. It provides a useful way to measure the degree of similarity between interval-valued vague sets.

User Similarity-based Path Prediction Method (사용자 유사도 기반 경로 예측 기법)

  • Nam, Sumin;Lee, Sukhoon
    • The Journal of Korean Institute of Information Technology
    • /
    • v.17 no.12
    • /
    • pp.29-38
    • /
    • 2019
  • A path prediction method using lifelog requires a large amount of training data for accurate path prediction, and the path prediction performance is degraded when the training data is insufficient. The lack of training data can be solved using data of other users having similar user movement patterns. Therefore, this paper proposes a path prediction algorithm based on user similarity. The proposed algorithm learns the path in a triple grid pattern and measures the similarity between users using the cosine similarity technique. Then, it predicts the path with applying measured similarity to the learned model. For the evaluation, we measure and compare the path prediction accuracy of proposed method with the existing algorithms. As a result, the proposed method has 66.6% accuracy, and it is evaluated that its accuracy is 1.8% higher than other methods.

Comparative Study on Similarity Measurement Methods in CBR Cost Estimation

  • Ahn, Joseph;Park, Moonseo;Lee, Hyun-Soo;Ahn, Sung Jin;Ji, Sae-Hyun;Kim, Sooyoung;Song, Kwonsik;Lee, Jeong Hoon
    • International conference on construction engineering and project management
    • /
    • 2015.10a
    • /
    • pp.597-598
    • /
    • 2015
  • In order to improve the reliability of cost estimation results using CBR, there has been a continuous issue on similarity measurement to accurately compute the distance among attributes and cases to retrieve the most similar singular or plural cases. However, these existing similarity measures have limitations in taking the covariance among attributes into consideration and reflecting the effects of covariance in computation of distances among attributes. To deal with this challenging issue, this research examines the weighted Mahalanobis distance based similarity measure applied to CBR cost estimation and carries out the comparative study on the existing distance measurement methods of CBR. To validate the suggest CBR cost model, leave-one-out cross validation (LOOCV) using two different sets of simulation data are carried out. Consequently, this research is expected to provide an analysis of covariance effects in similarity measurement and a basis for further research on the fundamentals of case retrieval.

  • PDF