• 제목/요약/키워드: Similarity measures

검색결과 304건 처리시간 0.041초

Performance Analysis of Similarity Reflecting Jaccard Index for Solving Data Sparsity in Collaborative Filtering (협력필터링의 데이터 희소성 해결을 위한 자카드 지수 반영의 유사도 성능 분석)

  • Lee, Soojung
    • The Journal of Korean Association of Computer Education
    • /
    • 제19권4호
    • /
    • pp.59-66
    • /
    • 2016
  • It has been studied to reflect the number of co-rated items for solving data sparsity problem in collaborative filtering systems. A well-known method of Jaccard index allowed performance improvement, when combined with previous similarity measures. However, the degree of performance improvement when combined with existing similarity measures in various data environments are seldom analyzed, which is the objective of this study. Jaccard index as a sole similarity measure yielded much higher prediction quality than traditional measures and very high recommendation quality in a sparse dataset. In general, previous similarity measures combined with Jaccard index improved performance regardless of dataset characteristics. Especially, cosine similarity achieved the highest improvement in sparse datasets, while similarity of Mean Squared Difference degraded prediction quality in denser sets. Therefore, one needs to consider characteristics of data environment and similarity measures before combining Jaccard index for similarity use.

Information Management by Data Quantification with FuzzyEntropy and Similarity Measure

  • Siang, Chua Hong;Lee, Sanghyuk
    • Journal of the Korea Convergence Society
    • /
    • 제4권2호
    • /
    • pp.35-41
    • /
    • 2013
  • Data management with fuzzy entropy and similarity measure were discussed and verified by applying reliable data selection problem. Calculation of certainty or uncertainty for data, fuzzy entropy and similarity measure are designed and proved. Proposed fuzzy entropy and similarity are considered as dissimilarity measure and similarity measure, and the relation between two measures are explained through graphical illustration.Obtained measures are useful to the application of decision theory and mutual information analysis problem. Extension of data quantification results based on the proposed measures are applicable to the decision making and fuzzy game theory.

Comparative Study on the Measures of Similarity for the Location Template Matching(LTM) Method (Location Template Matching(LTM) 방법에 사용되는 유사성 척도들의 비교 연구)

  • Shin, Kihong
    • Transactions of the Korean Society for Noise and Vibration Engineering
    • /
    • 제24권4호
    • /
    • pp.310-316
    • /
    • 2014
  • The location template matching(LTM) method is a technique of identifying an impact location on a structure, and requires a certain measure of similarity between two time signals. In general, the correlation coefficient is widely used as the measure of similarity, while the group delay based method is recently proposed to improve the accuracy of the impact localization. Another possible measure is the frequency response assurance criterion(FRAC), though this has not been applied yet. In this paper, these three different measures of similarity are examined comparatively by using experimental data in order to understand the properties of these measures of similarity. The comparative study shows that the correlation coefficient and the FRAC give almost the same information while the group delay based method gives the shape oriented information that is best suitable for the location template matching method.

Comparative Study on the Measures of Similarity for the Location Template Matching (LTM) Method (Location Template Matching(LTM) 방법에 사용되는 유사성 척도들의 비교 연구)

  • Shin, Kihong
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 한국소음진동공학회 2014년도 춘계학술대회 논문집
    • /
    • pp.506-511
    • /
    • 2014
  • The location template matching (LTM) method is a technique of identifying an impact location on a structure, and requires a certain measure of similarity between two time signals. In general, the correlation coefficient is widely used as the measure of similarity, while the group delay based method is recently proposed to improve the accuracy of the impact localization. Another possible measure is the frequency response assurance criterion (FRAC), though this has not been applied yet. In this paper, these three different measures of similarity are examined comparatively by using experimental data in order to understand the properties of these measures of similarity. The comparative study shows that the correlation coefficient and the FRAC give almost the same information while the group delay based method gives the shape oriented information that is best suitable for the location template matching method.

  • PDF

Survey on Vector Similarity Measures : Focusing on Algebraic Characteristics (대수적 특성을 고려한 벡터 유사도 측정 함수의 고찰)

  • Lee, Dongjoo;Shim, Junho
    • The Journal of Society for e-Business Studies
    • /
    • 제17권4호
    • /
    • pp.209-219
    • /
    • 2012
  • Objects such as products, product reviews, and user profiles are important in e-commerce domain. Vector is one of the most widely used object representation scheme. Information of e-commerce objects may be modeled by vectors in which the featured values are assigned to various dimensions. E-commerce objects are in general quantitatively large while some are similar or even same in reality. It Plays, therefore, an important role to measure the similarity between objects. In this paper, we survey the state-of-the -art vector similarity measures. Similarity measures are analyzed to feature the algebraic characteristics and relationship of those, and upon which we classify the related measures accordingly. We then present such features that standard vector similarity measures should convey.

Using User Rating Patterns for Selecting Neighbors in Collaborative Filtering

  • Lee, Soojung
    • Journal of the Korea Society of Computer and Information
    • /
    • 제24권9호
    • /
    • pp.77-82
    • /
    • 2019
  • Collaborative filtering is a popular technique for recommender systems and used in many practical commercial systems. Its basic principle is select similar neighbors of a current user and from their past preference information on items the system makes recommendations for the current user. One of the major problems inherent in this type of system is data sparsity of ratings. This is mainly caused from the underlying similarity measures which produce neighbors based on the ratings records. This paper handles this problem and suggests a new similarity measure. The proposed method takes users rating patterns into account for computing similarity, without just relying on the commonly rated items as in previous measures. Performance experiments of various existing measures are conducted and their performance is compared in terms of major performance metrics. As a result, the proposed measure reveals better or comparable achievements in all the metrics considered.

Exploration of PIM based similarity measures as association rule thresholds (확률적 흥미도를 이용한 유사성 측도의 연관성 평가 기준)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • 제23권6호
    • /
    • pp.1127-1135
    • /
    • 2012
  • Association rule mining is the method to quantify the relationship between each set of items in a large database. One of the well-studied problems in data mining is exploration for association rules. There are three primary quality measures for association rule, support and confidence and lift. We generate some association rules using confidence. Confidence is the most important measure of these measures, but it is an asymmetric measure and has only positive value. Thus we can face with difficult problems in generation of association rules. In this paper we apply the similarity measures by probabilistic interestingness measure to find a solution to this problem. The comparative studies with support, two confidences, lift, and some similarity measures by probabilistic interestingness measure are shown by numerical example. As the result, we knew that the similarity measures by probabilistic interestingness measure could be seen the degree of association same as confidence. And we could confirm the direction of association because they had the sign of their values.

On the Categorical Variable Clustering

  • Kim, Dae-Hak
    • Journal of the Korean Data and Information Science Society
    • /
    • 제7권2호
    • /
    • pp.219-226
    • /
    • 1996
  • Basic objective in cluster analysis is to discover natural groupings of items or variables. In general, variable clustering was conducted based on some similarity measures between variables which have binary characteristics. We propose a variable clustering method when variables have more categories ordered in some sense. We also consider some measures of association as a similarity between variables. Numerical example is included.

  • PDF

Distance measure between intuitionistic fuzzy sets and its application to pattern recognition

  • Park, Jin-Han;Lim, Ki-Moon;Kwun, Young-Chel
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • 제19권4호
    • /
    • pp.556-561
    • /
    • 2009
  • In this paper, we propose new method to calculate the distance between intuitionistic fuzzy sets(IFSs) based on the three dimensional representation of IFSs and analyze the relations of similarity measure and distance measure of IFSs. Finally, we apply the proposed measures to pattern recognitions.

Semantic Process Retrieval with Similarity Algorithms (유사도 알고리즘을 활용한 시맨틱 프로세스 검색방안)

  • Lee, Hong-Joo;Klein, Mark
    • Asia pacific journal of information systems
    • /
    • 제18권1호
    • /
    • pp.79-96
    • /
    • 2008
  • One of the roles of the Semantic Web services is to execute dynamic intra-organizational services including the integration and interoperation of business processes. Since different organizations design their processes differently, the retrieval of similar semantic business processes is necessary in order to support inter-organizational collaborations. Most approaches for finding services that have certain features and support certain business processes have relied on some type of logical reasoning and exact matching. This paper presents our approach of using imprecise matching for expanding results from an exact matching engine to query the OWL(Web Ontology Language) MIT Process Handbook. MIT Process Handbook is an electronic repository of best-practice business processes. The Handbook is intended to help people: (1) redesigning organizational processes, (2) inventing new processes, and (3) sharing ideas about organizational practices. In order to use the MIT Process Handbook for process retrieval experiments, we had to export it into an OWL-based format. We model the Process Handbook meta-model in OWL and export the processes in the Handbook as instances of the meta-model. Next, we need to find a sizable number of queries and their corresponding correct answers in the Process Handbook. Many previous studies devised artificial dataset composed of randomly generated numbers without real meaning and used subjective ratings for correct answers and similarity values between processes. To generate a semantic-preserving test data set, we create 20 variants for each target process that are syntactically different but semantically equivalent using mutation operators. These variants represent the correct answers of the target process. We devise diverse similarity algorithms based on values of process attributes and structures of business processes. We use simple similarity algorithms for text retrieval such as TF-IDF and Levenshtein edit distance to devise our approaches, and utilize tree edit distance measure because semantic processes are appeared to have a graph structure. Also, we design similarity algorithms considering similarity of process structure such as part process, goal, and exception. Since we can identify relationships between semantic process and its subcomponents, this information can be utilized for calculating similarities between processes. Dice's coefficient and Jaccard similarity measures are utilized to calculate portion of overlaps between processes in diverse ways. We perform retrieval experiments to compare the performance of the devised similarity algorithms. We measure the retrieval performance in terms of precision, recall and F measure? the harmonic mean of precision and recall. The tree edit distance shows the poorest performance in terms of all measures. TF-IDF and the method incorporating TF-IDF measure and Levenshtein edit distance show better performances than other devised methods. These two measures are focused on similarity between name and descriptions of process. In addition, we calculate rank correlation coefficient, Kendall's tau b, between the number of process mutations and ranking of similarity values among the mutation sets. In this experiment, similarity measures based on process structure, such as Dice's, Jaccard, and derivatives of these measures, show greater coefficient than measures based on values of process attributes. However, the Lev-TFIDF-JaccardAll measure considering process structure and attributes' values together shows reasonably better performances in these two experiments. For retrieving semantic process, we can think that it's better to consider diverse aspects of process similarity such as process structure and values of process attributes. We generate semantic process data and its dataset for retrieval experiment from MIT Process Handbook repository. We suggest imprecise query algorithms that expand retrieval results from exact matching engine such as SPARQL, and compare the retrieval performances of the similarity algorithms. For the limitations and future work, we need to perform experiments with other dataset from other domain. And, since there are many similarity values from diverse measures, we may find better ways to identify relevant processes by applying these values simultaneously.