• Title/Summary/Keyword: Similarity measures

Search Result 304, Processing Time 0.125 seconds

Catchment Similarity Assessment Based on Catchment Characteristics of GIS in Geum River Catchments, Korea (금강 유역을 대상으로 한 GIS 기반의 유역의 유사성 평가)

  • Lee, Hyo Sang;Park, Ki Soon;Jung, Sung Heuk;Choi, Seuk Keun
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.21 no.3
    • /
    • pp.37-46
    • /
    • 2013
  • Similarity measure of catchments is essential for regionalization studies, which provide in depth analysis in hydrological response and flood estimations at ungauged catchments. However, this similarity measure is often biased to the selected catchments and is not clearly explained in hydrological sense. This study applied a type of hydrological similarity distance measure-Flood Estimation Handbook to 25 Geum River catchments, Korea. Three Catchment Characteristics, Area(A)-Annual precipitation(SAAR)-SCS Curve Number(CN), are used in Euclidian distance measures. Furthermore, six index of Flow Duration Curve are applied to clustering analysis of SPSS. The catchments' grouping of hydrological similarity measures suggests three groups (H1, H2 and H3) and the four catchments are not grouped in this study. The clustering analysis of FDC provides four Groups; F1, F2, F3 and F4. The six catchments (out of seven) of H1 are grouped in F1, while Sangyeogyo is grouped in F2. The four catchments (out of six) of H2 are also grouped in F2, while Cheongju and Guryong are grouped in F1. The catchments of H3 are categorized in F1. The authors examine the results (H1, H2 and H3) of similarity measure based on catchment physical descriptors with results (F1 and F2) of clustering based on catchment hydrological response. The results of hydrological similarity measures are supported by clustering analysis of FDC. This study shows a potential of hydrological catchment similarity measures in Korea.

Utilization of similarity measures by PIM with AMP as association rule thresholds (모든 주변 비율을 고려한 확률적 흥미도 측도 기반 유사성 측도의 연관성 평가 기준 활용 방안)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.1
    • /
    • pp.117-124
    • /
    • 2013
  • Association rule of data mining techniques is the method to quantify the relationship between a set of items in a huge database, andhas been applied in various fields like internet shopping mall, healthcare, insurance, and education. There are three primary interestingness measures for association rule, support and confidence and lift. Confidence is the most important measure of these measures, and we generate some association rules using confidence. But it is an asymmetric measure and has only positive value. So we can face with difficult problems in generation of association rules. In this paper we apply the similarity measures by probabilistic interestingness measure (PIM) with all marginal proportions (AMP) to solve this problem. The comparative studies with support, confidences, lift, chi-square statistics, and some similarity measures by PIM with AMPare shown by numerical example. As the result, we knew that the similarity measures by PIM with AMP could be seen the degree of association same as confidence. And we could confirm the direction of association because they had the sign of their values, and select the best similarity measure by PIM with AMP.

A Study of CBIR(Content-based Image Retrieval) Computer-aided Diagnosis System of Breast Ultrasound Images using Similarity Measures of Distance (거리 기반 유사도 측정을 통한 유방 초음파 영상의 내용 기반 검색 컴퓨터 보조 진단 시스템에 관한 연구)

  • Kim, Min-jeong;Cho, Hyun-chong
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.66 no.8
    • /
    • pp.1272-1277
    • /
    • 2017
  • To assist radiologists for the characterization of breast masses, Computer-aided Diagnosis(CADx) system has been studied. The CADx system can improve the diagnostic accuracy of radiologists by providing objective information about breast masses. Morphological and texture features were extracted from the breast ultrasound images. Based on extracted features, the CADx system retrieves masses that are similar to a query mass from a reference library using a k-nearest neighbor (k-NN) approach. Eight similarity measures of distance, Euclidean, Chebyshev(Minkowski family), Canberra, Lorentzian($F_2$ family), Wave Hedges, Motyka(Intersection family), and Cosine, Dice(Inner Product family) are evaluated by ROC(Receiver Operating Characteristic) analysis. The Inner Product family measure used with the k-NN classifier provided slightly higher performance for classification of malignant and benign masses than those with the Minkowski, $F_2$, and Intersection family measures.

Applying Different Similarity Measures based on Jaccard Index in Collaborative Filtering

  • Lee, Soojung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.5
    • /
    • pp.47-53
    • /
    • 2021
  • Sparse ratings data hinder reliable similarity computation between users, which degrades the performance of memory-based collaborative filtering techniques for recommender systems. Many works in the literature have been developed for solving this data sparsity problem, where the most simple and representative ones are the methods of utilizing Jaccard index. This index reflects the number of commonly rated items between two users and is mostly integrated into traditional similarity measures to compute similarity more accurately between the users. However, such integration is very straightforward with no consideration of the degree of data sparsity. This study suggests a novel idea of applying different similarity measures depending on the numeric value of Jaccard index between two users. Performance experiments are conducted to obtain optimal values of the parameters used by the proposed method and evaluate it in comparison with other relevant methods. As a result, the proposed demonstrates the best and comparable performance in prediction and recommendation accuracies.

A New Similarity Measure based on Separation of Common Ratings for Collaborative Filtering

  • Lee, Soojung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.11
    • /
    • pp.149-156
    • /
    • 2021
  • Among various implementation techniques of recommender systems, collaborative filtering selects nearest neighbors with high similarity based on past rating history, recommends products preferred by them, and has been successfully utilized by many commercial sites. Accurate estimation of similarity is an important factor that determines performance of the system. Various similarity measures have been developed, which are mostly based on integrating traditional similarity measures and several indices already developed. This study suggests a similarity measure of a novel approach. It separates the common rating area between two users by the magnitude of ratings, estimates similarity for each subarea, and integrates them with weights. This enables identifying similar subareas and reflecting it onto a final similarity value. Performance evaluation using two open datasets is conducted, resulting in that the proposed outperforms the previous one in terms of prediction accuracy, rank accuracy, and mean average precision especially with the dense dataset. The proposed similarity measure is expected to be utilized in various commercial systems for recommending products more suited to user preference.

Mutual Information Analysis with Similarity Measure

  • Wang, Hong-Mei;Lee, Sang-Hyuk
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.10 no.3
    • /
    • pp.218-223
    • /
    • 2010
  • Discussion and analysis about relative mutual information has been carried out through fuzzy entropy and similarity measure. Fuzzy relative mutual information measure (FRIM) plays an important part as a measure of information shared between two fuzzy pattern vectors. This FRIM is analyzed and explained through similarity measure between two fuzzy sets. Furthermore, comparison between two measures is also carried out.

A New Similarity Measure using Fuzzy Logic for User-based Collaborative Filtering (사용자 기반의 협력필터링을 위한 퍼지 논리를 이용한 새로운 유사도 척도)

  • Lee, Soojung
    • The Journal of Korean Association of Computer Education
    • /
    • v.21 no.5
    • /
    • pp.61-68
    • /
    • 2018
  • Collaborative filtering is a fundamental technique implemented in many commercial recommender systems and provides a successful service to online users. This technique recommends items by referring to other users who have similar rating records to the current user. Hence, similarity measures critically affect the system performance. This study addresses problems of previous similarity measures and suggests a new similarity measure. The proposed measure reflects the subjectivity or vagueness of user ratings and the users' rating behavior by using fuzzy logic. We conduct experimental studies for performance evaluation, whose results show that the proposed measure demonstrates outstanding performance improvements in terms of prediction accuracy and recommendation accuracy.

A Study of Similarity Measures on Multidimensional Data Sequences Using Semantic Information (의미 정보를 이용한 다차원 데이터 시퀀스의 유사성 척도 연구)

  • Lee, Seok-Lyong;Lee, Ju-Hong;Chun, Seok-Ju
    • The KIPS Transactions:PartD
    • /
    • v.10D no.2
    • /
    • pp.283-292
    • /
    • 2003
  • One-dimensional time-series data have been studied in various database applications such as data mining and data warehousing. However, in the current complex business environment, multidimensional data sequences (MDS') become increasingly important in addition to one-dimensional time-series data. For example, a video stream can be modeled as an MDS in the multidimensional space with respect to color and texture attributes. In this paper, we propose the effective similarity measures on which the similar pattern retrieval is based. An MDS is partitioned into segments, each of which is represented by various geometric and semantic features. The similarity measures are defined on the basis of these segments. Using the measures, irrelevant segments are pruned from a database with respect to a given query. Both data sequences and query sequences are partitioned into segments, and the query processing is based upon the comparison of the features between data and query segments, instead of scanning all data elements of entire sequences.

Development of the 1st-Order Similarity Measure and the 2nd-Order Similarity Measure Based on the Least-Squares Method (최소 자승법에 의한 1차 유사도 및 2차 유사도의 개발)

  • 강환일;석민수
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.20 no.6
    • /
    • pp.23-28
    • /
    • 1983
  • Two measures of similarity between contours, the 1 st-order similarity measure and the 2nd-order similarity measure are proposed. They are based on the residual errors of the least squares fit. In particular, the 2nd-order similarity measure has a good reliability with respect to contours of many variations such as imperfection, affine transform or combination of these properties. By taking experiments of aircraft identification and recognition we show that in the matching performance the 2nd -order similarity measure is superior not only to the 1 st-order similarity measure but also to the previous matching techniques.

  • PDF

A Study on Decision Tree for Multiple Binary Responses

  • Lee, Seong-Keon
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.3
    • /
    • pp.971-980
    • /
    • 2003
  • The tree method can be extended to multivariate responses, such as repeated measure and longitudinal data, by modifying the split function so as to accommodate multiple responses. Recently, some decision trees for multiple responses have been constructed by Segal (1992) and Zhang (1998). Segal suggested a tree can analyze continuous longitudinal response using Mahalanobis distance for within node homogeneity measures and Zhang suggested a tree can analyze multiple binary responses using generalized entropy criterion which is proportional to maximum likelihood of joint distribution of multiple binary responses. In this paper, we will modify CART procedure and suggest a new tree-based method that can analyze multiple binary responses using similarity measures.