• 제목/요약/키워드: similarity.

검색결과 8,094건 처리시간 0.037초

Empirical Comparison of Word Similarity Measures Based on Co-Occurrence, Context, and a Vector Space Model

  • Kadowaki, Natsuki;Kishida, Kazuaki
    • Journal of Information Science Theory and Practice
    • /
    • 제8권2호
    • /
    • pp.6-17
    • /
    • 2020
  • Word similarity is often measured to enhance system performance in the information retrieval field and other related areas. This paper reports on an experimental comparison of values for word similarity measures that were computed based on 50 intentionally selected words from a Reuters corpus. There were three targets, including (1) co-occurrence-based similarity measures (for which a co-occurrence frequency is counted as the number of documents or sentences), (2) context-based distributional similarity measures obtained from a latent Dirichlet allocation (LDA), nonnegative matrix factorization (NMF), and Word2Vec algorithm, and (3) similarity measures computed from the tf-idf weights of each word according to a vector space model (VSM). Here, a Pearson correlation coefficient for a pair of VSM-based similarity measures and co-occurrence-based similarity measures according to the number of documents was highest. Group-average agglomerative hierarchical clustering was also applied to similarity matrices computed by individual measures. An evaluation of the cluster sets according to an answer set revealed that VSM- and LDA-based similarity measures performed best.

A Max-Flow-Based Similarity Measure for Spectral Clustering

  • Cao, Jiangzhong;Chen, Pei;Zheng, Yun;Dai, Qingyun
    • ETRI Journal
    • /
    • 제35권2호
    • /
    • pp.311-320
    • /
    • 2013
  • In most spectral clustering approaches, the Gaussian kernel-based similarity measure is used to construct the affinity matrix. However, such a similarity measure does not work well on a dataset with a nonlinear and elongated structure. In this paper, we present a new similarity measure to deal with the nonlinearity issue. The maximum flow between data points is computed as the new similarity, which can satisfy the requirement for similarity in the clustering method. Additionally, the new similarity carries the global and local relations between data. We apply it to spectral clustering and compare the proposed similarity measure with other state-of-the-art methods on both synthetic and real-world data. The experiment results show the superiority of the new similarity: 1) The max-flow-based similarity measure can significantly improve the performance of spectral clustering; 2) It is robust and not sensitive to the parameters.

A NOTE ON APPROXIMATE SIMILARITY

  • Hadwin, Don
    • 대한수학회지
    • /
    • 제38권6호
    • /
    • pp.1157-1166
    • /
    • 2001
  • This paper answers some old questions about approximate similarity and raises new ones. We provide positive evidence and a technique for finding negative evidence on the question of whether approximate similarity is the equivalence relation generated by approximate equivalence and similarity.

  • PDF

멀티모달 기반 악성코드 유사도 계산 기법 (Multi-Modal Based Malware Similarity Estimation Method)

  • 유정도;김태규;김인성;김휘강
    • 정보보호학회논문지
    • /
    • 제29권2호
    • /
    • pp.347-363
    • /
    • 2019
  • 사람의 DNA가 변하지 않는 것과 같이 사이버상의 악성코드도 변하지 않는 고유의 행위 특징을 갖고 있다. APT(Advanced Persistent Threat) 공격에 대한 방어수단을 사전에 확보하기 위해서는 악성코드의 악성 행위 특징을 추출해야 한다. 이를 위해서는 먼저 악성코드 간의 유사도를 계산하여 유사한 악성코드끼리 분류할 수 있어야 한다. 본 논문에서는 Windows OS 상에서 동작하는 악성코드 간의 유사도 계산 방법으로 'TF-IDF 코사인 유사도', 'Nilsimsa 유사도', '악성코드 기능 유사도', 'Jaccard 유사도'를 사용해 악성코드의 유형을 예측해보고, 그 결과를 보인다. 실험결과, 유사도 계산 방식마다 악성코드 유형에 따라 예측률의 차이가 매우 컸음을 발견할 수 있었다. 모든 결과에 월등한 정확도를 보인 유사도는 존재하지 않았으나, 본 실험결과를 이용하여 특정 패밀리의 악성코드를 분류할 때 어떤 유사도 계산 방식을 활용하는 것이 상대적으로 유리할지를 결정할 때 도움이 될 것으로 판단된다.

A Study on the Performance of Similarity Indices and its Relationship with Link Prediction: a Two-State Random Network Case

  • Ahn, Min-Woo;Jung, Woo-Sung
    • Journal of the Korean Physical Society
    • /
    • 제73권10호
    • /
    • pp.1589-1595
    • /
    • 2018
  • Similarity index measures the topological proximity of node pairs in a complex network. Numerous similarity indices have been defined and investigated, but the dependency of structure on the performance of similarity indices has not been sufficiently investigated. In this study, we investigated the relationship between the performance of similarity indices and structural properties of a network by employing a two-state random network. A node in a two-state network has binary types that are initially given, and a connection probability is determined from the state of the node pair. The performances of similarity indices are affected by the number of links and the ratio of intra-connections to inter-connections. Similarity indices have different characteristics depending on their type. Local indices perform well in small-size networks and do not depend on whether the structure is intra-dominant or inter-dominant. In contrast, global indices perform better in large-size networks, and some such indices do not perform well in an inter-dominant structure. We also found that link prediction performance and the performance of similarity are correlated in both model networks and empirical networks. This relationship implies that link prediction performance can be used as an approximation for the performance of the similarity index when information about node type is unavailable. This relationship may help to find the appropriate index for given networks.

Similarity Classifier based on Schweizer & Sklars t-norms

  • Luukka, P.;Sampo, J.
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 2004년도 ICCAS
    • /
    • pp.1053-1056
    • /
    • 2004
  • In this article we have applied Schweizer & Sklars t-norm based similarity measures to classification task. We will compare results to fuzzy similarity measure based classification and show that sometimes better results can be found by using these measures than fuzzy similarity measure. We will also show that classification results are not so sensitive to p values with Schweizer & Sklars measures than when fuzzy similarity is used. This is quite important when one does not have luxury of tuning these kind of parameters but needs good classification results fast.

  • PDF

Grouping DNA sequences with similarity measure and application

  • Lee, Sanghyuk
    • 한국융합학회논문지
    • /
    • 제4권3호
    • /
    • pp.35-41
    • /
    • 2013
  • Grouping problem with similarities between DNA sequences are studied. The similaritymeasure and the distance measure showed the complementary characteristics. Distance measure can be obtained by complementing similarity measure, and vice versa. Similarity measure is derived and proved. Usefulness of the proposed similarity measure is applied to grouping problem of 25 cockroach DNA sequences. By calculation of DNA similarity, 25 cockroaches are clustered by four groups, and the results are compared with the previous neighbor-joining method.

아이템의 유사도를 고려한 트랜잭션 클러스터링 (Transactions Clustering based on Item Similarity)

  • 이상욱;김재련
    • 한국지능정보시스템학회:학술대회논문집
    • /
    • 한국지능정보시스템학회 2002년도 추계정기학술대회
    • /
    • pp.250-257
    • /
    • 2002
  • Clustering is a data mining method, which consists in discovering interesting data distributions in very large databases. In traditional data clustering, similarity of a cluster of object is measured by pairwise similarity of objects in that paper. In view of the nature of clustering transactions, we devise in this paper a novel measurement called item similarity and utilize this to perform clustering. With this item similarity measurement, we develop an efficient clustering algorithm for target marketing in each group.

  • PDF

SIMILAR AND SELF-SIMILAR CURVES IN MINKOWSKI n-SPACE

  • OZDEMIR, MUSTAFA;SIMSEK, HAKAN
    • 대한수학회보
    • /
    • 제52권6호
    • /
    • pp.2071-2093
    • /
    • 2015
  • In this paper, we investigate the similarity transformations in the Minkowski n-space. We study the geometric invariants of non-null curves under the similarity transformations. Besides, we extend the fundamental theorem for a non-null curve according to a similarity motion of ${\mathbb{E}}_1^n$. We determine the parametrizations of non-null self-similar curves in ${\mathbb{E}}_1^n$.

Similarity Measure Construction with Fuzzy Entropy and Distance Measure

  • Lee Sang-Hyuk
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제5권4호
    • /
    • pp.367-371
    • /
    • 2005
  • The similarity measure is derived using fuzzy entropy and distance measure. By the elations of fuzzy entropy, distance measure, and similarity measure, we first obtain the fuzzy entropy. And with both fuzzy entropy and distance measure, similarity measure is obtained., We verify that the proposed measure become the similarity measure.