• 제목/요약/키워드: similarity-based

검색결과 3,604건 처리시간 0.031초

Empirical Comparison of Word Similarity Measures Based on Co-Occurrence, Context, and a Vector Space Model

  • Kadowaki, Natsuki;Kishida, Kazuaki
    • Journal of Information Science Theory and Practice
    • /
    • 제8권2호
    • /
    • pp.6-17
    • /
    • 2020
  • Word similarity is often measured to enhance system performance in the information retrieval field and other related areas. This paper reports on an experimental comparison of values for word similarity measures that were computed based on 50 intentionally selected words from a Reuters corpus. There were three targets, including (1) co-occurrence-based similarity measures (for which a co-occurrence frequency is counted as the number of documents or sentences), (2) context-based distributional similarity measures obtained from a latent Dirichlet allocation (LDA), nonnegative matrix factorization (NMF), and Word2Vec algorithm, and (3) similarity measures computed from the tf-idf weights of each word according to a vector space model (VSM). Here, a Pearson correlation coefficient for a pair of VSM-based similarity measures and co-occurrence-based similarity measures according to the number of documents was highest. Group-average agglomerative hierarchical clustering was also applied to similarity matrices computed by individual measures. An evaluation of the cluster sets according to an answer set revealed that VSM- and LDA-based similarity measures performed best.

Development of the Recommender System of Arabic Books Based on the Content Similarity

  • Alotaibi, Shaykhah Hajed;Khan, Muhammad Badruddin
    • International Journal of Computer Science & Network Security
    • /
    • 제22권8호
    • /
    • pp.175-186
    • /
    • 2022
  • This research article develops an Arabic books' recommendation system, which is based on the content similarity that assists users to search for the right book and predict the appropriate and suitable books pertaining to their literary style. In fact, the system directs its users toward books, which can meet their needs from a large dataset of Information. Further, this system makes its predictions based on a set of data that is gathered from different books and converts it to vectors by using the TF-IDF system. After that, the recommendation algorithms such as the cosine similarity, the sequence matcher similarity, and the semantic similarity aggregate data to produce an efficient and effective recommendation. This approach is advantageous in recommending previously unrated books to users with unique interests. It is found to be proven from the obtained results that the results of the cosine similarity of the full content of books, the results of the sequence matcher similarity of Arabic titles of the books, and the results of the semantic similarity of English titles of the books are the best obtained results, and extremely close to the average of the result related to the human assigned/annotated similarity. Flask web application is developed with a simple interface to show the recommended Arabic books by using cosine similarity, sequence matcher similarity, and semantic similarity algorithms with all experiments that are conducted.

유사측도에 기반한 퍼지 엔트로피구성 (Fuzzy Entropy Construction based on Similarity Measure)

  • Park, Wook-Je;Park, Hyun-Jeong;Lee, Sang-H
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국지능시스템학회 2007년도 추계학술대회 학술발표 논문집
    • /
    • pp.366-369
    • /
    • 2007
  • In this paper we derived fuzzy entropy that is based on similarity measure. Similarity measure represents the degree of similarity between two informations, those informations characteristics are not important. First we construct similarity measure between two informations, and derived entropy functions with obtained similarity measure. Obtained entropy is verified with proof. With the help of one-to-one similarity is also obtained through distance measure, this similarity measure is also proved in our paper.

  • PDF

Similarity Classifier based on Schweizer & Sklars t-norms

  • Luukka, P.;Sampo, J.
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 2004년도 ICCAS
    • /
    • pp.1053-1056
    • /
    • 2004
  • In this article we have applied Schweizer & Sklars t-norm based similarity measures to classification task. We will compare results to fuzzy similarity measure based classification and show that sometimes better results can be found by using these measures than fuzzy similarity measure. We will also show that classification results are not so sensitive to p values with Schweizer & Sklars measures than when fuzzy similarity is used. This is quite important when one does not have luxury of tuning these kind of parameters but needs good classification results fast.

  • PDF

A Max-Flow-Based Similarity Measure for Spectral Clustering

  • Cao, Jiangzhong;Chen, Pei;Zheng, Yun;Dai, Qingyun
    • ETRI Journal
    • /
    • 제35권2호
    • /
    • pp.311-320
    • /
    • 2013
  • In most spectral clustering approaches, the Gaussian kernel-based similarity measure is used to construct the affinity matrix. However, such a similarity measure does not work well on a dataset with a nonlinear and elongated structure. In this paper, we present a new similarity measure to deal with the nonlinearity issue. The maximum flow between data points is computed as the new similarity, which can satisfy the requirement for similarity in the clustering method. Additionally, the new similarity carries the global and local relations between data. We apply it to spectral clustering and compare the proposed similarity measure with other state-of-the-art methods on both synthetic and real-world data. The experiment results show the superiority of the new similarity: 1) The max-flow-based similarity measure can significantly improve the performance of spectral clustering; 2) It is robust and not sensitive to the parameters.

Assessment of performance of machine learning based similarities calculated for different English translations of Holy Quran

  • Al Ghamdi, Norah Mohammad;Khan, Muhammad Badruddin
    • International Journal of Computer Science & Network Security
    • /
    • 제22권4호
    • /
    • pp.111-118
    • /
    • 2022
  • This research article presents the work that is related to the application of different machine learning based similarity techniques on religious text for identifying similarities and differences among its various translations. The dataset includes 10 different English translations of verses (Arabic: Ayah) of two Surahs (chapters) namely, Al-Humazah and An-Nasr. The quantitative similarity values for different translations for the same verse were calculated by using the cosine similarity and semantic similarity. The corpus went through two series of experiments: before pre-processing and after pre-processing. In order to determine the performance of machine learning based similarities, human annotated similarities between translations of two Surahs (chapters) namely Al-Humazah and An-Nasr were recorded to construct the ground truth. The average difference between the human annotated similarity and the cosine similarity for Surah (chapter) Al-Humazah was found to be 1.38 per verse (ayah) per pair of translation. After pre-processing, the average difference increased to 2.24. Moreover, the average difference between human annotated similarity and semantic similarity for Surah (chapter) Al-Humazah was found to be 0.09 per verse (Ayah) per pair of translation. After pre-processing, it increased to 0.78. For the Surah (chapter) An-Nasr, before preprocessing, the average difference between human annotated similarity and cosine similarity was found to be 1.93 per verse (Ayah), per pair of translation. And. After pre-processing, the average difference further increased to 2.47. The average difference between the human annotated similarity and the semantic similarity for Surah An-Nasr before preprocessing was found to be 0.93 and after pre-processing, it was reduced to 0.87 per verse (ayah) per pair of translation. The results showed that as expected, the semantic similarity was proven to be better measurement indicator for calculation of the word meaning.

Dynamic gesture recognition using a model-based temporal self-similarity and its application to taebo gesture recognition

  • Lee, Kyoung-Mi;Won, Hey-Min
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제7권11호
    • /
    • pp.2824-2838
    • /
    • 2013
  • There has been a lot of attention paid recently to analyze dynamic human gestures that vary over time. Most attention to dynamic gestures concerns with spatio-temporal features, as compared to analyzing each frame of gestures separately. For accurate dynamic gesture recognition, motion feature extraction algorithms need to find representative features that uniquely identify time-varying gestures. This paper proposes a new feature-extraction algorithm using temporal self-similarity based on a hierarchical human model. Because a conventional temporal self-similarity method computes a whole movement among the continuous frames, the conventional temporal self-similarity method cannot recognize different gestures with the same amount of movement. The proposed model-based temporal self-similarity method groups body parts of a hierarchical model into several sets and calculates movements for each set. While recognition results can depend on how the sets are made, the best way to find optimal sets is to separate frequently used body parts from less-used body parts. Then, we apply a multiclass support vector machine whose optimization algorithm is based on structural support vector machines. In this paper, the effectiveness of the proposed feature extraction algorithm is demonstrated in an application for taebo gesture recognition. We show that the model-based temporal self-similarity method can overcome the shortcomings of the conventional temporal self-similarity method and the recognition results of the model-based method are superior to that of the conventional method.

다중레벨 벡터양자화 기반의 유사도를 이용한 자동 음악요약 (Automatic Music Summarization Using Similarity Measure Based on Multi-Level Vector Quantization)

  • 김성탁;김상호;김회린
    • The Journal of the Acoustical Society of Korea
    • /
    • 제26권2E호
    • /
    • pp.39-43
    • /
    • 2007
  • Music summarization refers to a technique which automatically extracts the most important and representative segments in music content. In this paper, we propose and evaluate a technique which provides the repeated part in music content as music summary. For extracting a repeated segment in music content, the proposed algorithm uses the weighted sum of similarity measures based on multi-level vector quantization for fixed-length summary or optimal-length summary. For similarity measures, count-based similarity measure and distance-based similarity measure are proposed. The number of the same codeword and the Mahalanobis distance of features which have same codeword at the same position in segments are used for count-based and distance-based similarity measure, respectively. Fixed-length music summary is evaluated by measuring the overlapping ratio between hand-made repeated parts and automatically generated ones. Optimal-length music summary is evaluated by calculating how much automatically generated music summary includes repeated parts of the music content. From experiments we observed that optimal-length summary could capture the repeated parts in music content more effectively in terms of summary length than fixed-length summary.

Fuzzy Entropy Construction based on Similarity Measure

  • 박현정;양인석;류수록;이상혁
    • 한국지능시스템학회논문지
    • /
    • 제18권2호
    • /
    • pp.257-261
    • /
    • 2008
  • In this Paper we derived fuzzy entropy that is based on similarity measure. Similarity measure represents the degree of similarity between two informations, those informations characteristics are not important. First we construct similarity measure between two informations, and derived entropy functions with obtained similarity measure. Obtained entropy is verified with proof. With the help of one-to-one similarity is also obtained through distance measure, this similarity measure is also proved in our paper.

무선 모바일 프록시 시스템에서 유사도 기반의 캐싱 손실 최소화 (Similarity-based Caching Replacement Loss Minimization in Wireless Mobile Proxy Systems)

  • 이종득
    • 한국항행학회논문지
    • /
    • 제16권3호
    • /
    • pp.455-462
    • /
    • 2012
  • 무선 모바일 프록시 캐싱 구조에서 캐싱 교체로 인한 손실은 스트리밍 QoS에 중요한 영향을 미친다. 본 논문에서는 캐싱 교체 과정에서 발생하는 손실을 최소화하기 위하여 유사도 기반의 캐싱 손실 최소화 기법 SCLM(Similarity-based Caching Loss Minimization)을 제안한다. 제안된 기법은 객체 세그먼트를 분할 한 후 유사도 관계를 수행한다. 유사도 관계가 수행된 세그먼트들은 SRT (Similarity Relation Tree)가 생성되고 유사도가 측정된다. 유사도는 적합성 피드백을 결정하는 중요한 척도로서 적합성을 만족한 세그먼트들은 캐싱 교체를 위해 캐시 블록에 저장한다. 시뮬레이션 결과 제안된 기법은 prefix 캐싱 기법, segment 캐싱 기법, 그리고 bps 캐싱 기법에 비해서 캐싱 시작 지연 제어율, 캐시 처리율, 그리고 캐시 응답율의 QoS가 효율적임을 보인다.