• Title/Summary/Keyword: similarity-based

Search Result 3,591, Processing Time 0.028 seconds

Empirical Comparison of Word Similarity Measures Based on Co-Occurrence, Context, and a Vector Space Model

  • Kadowaki, Natsuki;Kishida, Kazuaki
    • Journal of Information Science Theory and Practice
    • /
    • v.8 no.2
    • /
    • pp.6-17
    • /
    • 2020
  • Word similarity is often measured to enhance system performance in the information retrieval field and other related areas. This paper reports on an experimental comparison of values for word similarity measures that were computed based on 50 intentionally selected words from a Reuters corpus. There were three targets, including (1) co-occurrence-based similarity measures (for which a co-occurrence frequency is counted as the number of documents or sentences), (2) context-based distributional similarity measures obtained from a latent Dirichlet allocation (LDA), nonnegative matrix factorization (NMF), and Word2Vec algorithm, and (3) similarity measures computed from the tf-idf weights of each word according to a vector space model (VSM). Here, a Pearson correlation coefficient for a pair of VSM-based similarity measures and co-occurrence-based similarity measures according to the number of documents was highest. Group-average agglomerative hierarchical clustering was also applied to similarity matrices computed by individual measures. An evaluation of the cluster sets according to an answer set revealed that VSM- and LDA-based similarity measures performed best.

Development of the Recommender System of Arabic Books Based on the Content Similarity

  • Alotaibi, Shaykhah Hajed;Khan, Muhammad Badruddin
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.8
    • /
    • pp.175-186
    • /
    • 2022
  • This research article develops an Arabic books' recommendation system, which is based on the content similarity that assists users to search for the right book and predict the appropriate and suitable books pertaining to their literary style. In fact, the system directs its users toward books, which can meet their needs from a large dataset of Information. Further, this system makes its predictions based on a set of data that is gathered from different books and converts it to vectors by using the TF-IDF system. After that, the recommendation algorithms such as the cosine similarity, the sequence matcher similarity, and the semantic similarity aggregate data to produce an efficient and effective recommendation. This approach is advantageous in recommending previously unrated books to users with unique interests. It is found to be proven from the obtained results that the results of the cosine similarity of the full content of books, the results of the sequence matcher similarity of Arabic titles of the books, and the results of the semantic similarity of English titles of the books are the best obtained results, and extremely close to the average of the result related to the human assigned/annotated similarity. Flask web application is developed with a simple interface to show the recommended Arabic books by using cosine similarity, sequence matcher similarity, and semantic similarity algorithms with all experiments that are conducted.

Fuzzy Entropy Construction based on Similarity Measure (유사측도에 기반한 퍼지 엔트로피구성)

  • Park, Wook-Je;Park, Hyun-Jeong;Lee, Sang-H
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2007.11a
    • /
    • pp.366-369
    • /
    • 2007
  • In this paper we derived fuzzy entropy that is based on similarity measure. Similarity measure represents the degree of similarity between two informations, those informations characteristics are not important. First we construct similarity measure between two informations, and derived entropy functions with obtained similarity measure. Obtained entropy is verified with proof. With the help of one-to-one similarity is also obtained through distance measure, this similarity measure is also proved in our paper.

  • PDF

Similarity Classifier based on Schweizer & Sklars t-norms

  • Luukka, P.;Sampo, J.
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2004.08a
    • /
    • pp.1053-1056
    • /
    • 2004
  • In this article we have applied Schweizer & Sklars t-norm based similarity measures to classification task. We will compare results to fuzzy similarity measure based classification and show that sometimes better results can be found by using these measures than fuzzy similarity measure. We will also show that classification results are not so sensitive to p values with Schweizer & Sklars measures than when fuzzy similarity is used. This is quite important when one does not have luxury of tuning these kind of parameters but needs good classification results fast.

  • PDF

A Max-Flow-Based Similarity Measure for Spectral Clustering

  • Cao, Jiangzhong;Chen, Pei;Zheng, Yun;Dai, Qingyun
    • ETRI Journal
    • /
    • v.35 no.2
    • /
    • pp.311-320
    • /
    • 2013
  • In most spectral clustering approaches, the Gaussian kernel-based similarity measure is used to construct the affinity matrix. However, such a similarity measure does not work well on a dataset with a nonlinear and elongated structure. In this paper, we present a new similarity measure to deal with the nonlinearity issue. The maximum flow between data points is computed as the new similarity, which can satisfy the requirement for similarity in the clustering method. Additionally, the new similarity carries the global and local relations between data. We apply it to spectral clustering and compare the proposed similarity measure with other state-of-the-art methods on both synthetic and real-world data. The experiment results show the superiority of the new similarity: 1) The max-flow-based similarity measure can significantly improve the performance of spectral clustering; 2) It is robust and not sensitive to the parameters.

Assessment of performance of machine learning based similarities calculated for different English translations of Holy Quran

  • Al Ghamdi, Norah Mohammad;Khan, Muhammad Badruddin
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.4
    • /
    • pp.111-118
    • /
    • 2022
  • This research article presents the work that is related to the application of different machine learning based similarity techniques on religious text for identifying similarities and differences among its various translations. The dataset includes 10 different English translations of verses (Arabic: Ayah) of two Surahs (chapters) namely, Al-Humazah and An-Nasr. The quantitative similarity values for different translations for the same verse were calculated by using the cosine similarity and semantic similarity. The corpus went through two series of experiments: before pre-processing and after pre-processing. In order to determine the performance of machine learning based similarities, human annotated similarities between translations of two Surahs (chapters) namely Al-Humazah and An-Nasr were recorded to construct the ground truth. The average difference between the human annotated similarity and the cosine similarity for Surah (chapter) Al-Humazah was found to be 1.38 per verse (ayah) per pair of translation. After pre-processing, the average difference increased to 2.24. Moreover, the average difference between human annotated similarity and semantic similarity for Surah (chapter) Al-Humazah was found to be 0.09 per verse (Ayah) per pair of translation. After pre-processing, it increased to 0.78. For the Surah (chapter) An-Nasr, before preprocessing, the average difference between human annotated similarity and cosine similarity was found to be 1.93 per verse (Ayah), per pair of translation. And. After pre-processing, the average difference further increased to 2.47. The average difference between the human annotated similarity and the semantic similarity for Surah An-Nasr before preprocessing was found to be 0.93 and after pre-processing, it was reduced to 0.87 per verse (ayah) per pair of translation. The results showed that as expected, the semantic similarity was proven to be better measurement indicator for calculation of the word meaning.

Dynamic gesture recognition using a model-based temporal self-similarity and its application to taebo gesture recognition

  • Lee, Kyoung-Mi;Won, Hey-Min
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.11
    • /
    • pp.2824-2838
    • /
    • 2013
  • There has been a lot of attention paid recently to analyze dynamic human gestures that vary over time. Most attention to dynamic gestures concerns with spatio-temporal features, as compared to analyzing each frame of gestures separately. For accurate dynamic gesture recognition, motion feature extraction algorithms need to find representative features that uniquely identify time-varying gestures. This paper proposes a new feature-extraction algorithm using temporal self-similarity based on a hierarchical human model. Because a conventional temporal self-similarity method computes a whole movement among the continuous frames, the conventional temporal self-similarity method cannot recognize different gestures with the same amount of movement. The proposed model-based temporal self-similarity method groups body parts of a hierarchical model into several sets and calculates movements for each set. While recognition results can depend on how the sets are made, the best way to find optimal sets is to separate frequently used body parts from less-used body parts. Then, we apply a multiclass support vector machine whose optimization algorithm is based on structural support vector machines. In this paper, the effectiveness of the proposed feature extraction algorithm is demonstrated in an application for taebo gesture recognition. We show that the model-based temporal self-similarity method can overcome the shortcomings of the conventional temporal self-similarity method and the recognition results of the model-based method are superior to that of the conventional method.

Automatic Music Summarization Using Similarity Measure Based on Multi-Level Vector Quantization (다중레벨 벡터양자화 기반의 유사도를 이용한 자동 음악요약)

  • Kim, Sung-Tak;Kim, Sang-Ho;Kim, Hoi-Rin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.2E
    • /
    • pp.39-43
    • /
    • 2007
  • Music summarization refers to a technique which automatically extracts the most important and representative segments in music content. In this paper, we propose and evaluate a technique which provides the repeated part in music content as music summary. For extracting a repeated segment in music content, the proposed algorithm uses the weighted sum of similarity measures based on multi-level vector quantization for fixed-length summary or optimal-length summary. For similarity measures, count-based similarity measure and distance-based similarity measure are proposed. The number of the same codeword and the Mahalanobis distance of features which have same codeword at the same position in segments are used for count-based and distance-based similarity measure, respectively. Fixed-length music summary is evaluated by measuring the overlapping ratio between hand-made repeated parts and automatically generated ones. Optimal-length music summary is evaluated by calculating how much automatically generated music summary includes repeated parts of the music content. From experiments we observed that optimal-length summary could capture the repeated parts in music content more effectively in terms of summary length than fixed-length summary.

Fuzzy Entropy Construction based on Similarity Measure

  • Park, Hyun-Jeong;Yang, In-Suk;Ryu, Soo-Rok;Lee, Sang-Hyuk
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.2
    • /
    • pp.257-261
    • /
    • 2008
  • In this Paper we derived fuzzy entropy that is based on similarity measure. Similarity measure represents the degree of similarity between two informations, those informations characteristics are not important. First we construct similarity measure between two informations, and derived entropy functions with obtained similarity measure. Obtained entropy is verified with proof. With the help of one-to-one similarity is also obtained through distance measure, this similarity measure is also proved in our paper.

Similarity-based Caching Replacement Loss Minimization in Wireless Mobile Proxy Systems (무선 모바일 프록시 시스템에서 유사도 기반의 캐싱 손실 최소화)

  • Lee, Chong-Deuk
    • Journal of Advanced Navigation Technology
    • /
    • v.16 no.3
    • /
    • pp.455-462
    • /
    • 2012
  • The loss due to caching replacement in the wireless mobile proxy caching structure has a significant effect on streaming QoS. This paper proposes a similarity-based caching loss minimization (SCLM) for minimizing the loss caused by the caching replacement. The proposed scheme divides object segments, and then it performs the similarity relation about them. Segments that perform the similarity relation generates similarity relation tree (SRT). The similarity is an important metric for deciding a relevance feedback, and segments that satisfy these requirements in the cache block for caching replacement. Simulation results show that the proposed scheme has better performance than the existing prefix caching scheme, segment-based caching scheme, and bi-directional proxy scheme in terms of QoS, average delayed startup ratio, cache throughput, and cache response ratio.