통합 검색 | Korea Science

Empirical Comparison of Word Similarity Measures Based on Co-Occurrence, Context, and a Vector Space Model

Kadowaki, Natsuki;Kishida, Kazuaki
- Journal of Information Science Theory and Practice
- /
- 제8권2호
- /
- pp.6-17
- /
- 2020
Word similarity is often measured to enhance system performance in the information retrieval field and other related areas. This paper reports on an experimental comparison of values for word similarity measures that were computed based on 50 intentionally selected words from a Reuters corpus. There were three targets, including (1) co-occurrence-based similarity measures (for which a co-occurrence frequency is counted as the number of documents or sentences), (2) context-based distributional similarity measures obtained from a latent Dirichlet allocation (LDA), nonnegative matrix factorization (NMF), and Word2Vec algorithm, and (3) similarity measures computed from the tf-idf weights of each word according to a vector space model (VSM). Here, a Pearson correlation coefficient for a pair of VSM-based similarity measures and co-occurrence-based similarity measures according to the number of documents was highest. Group-average agglomerative hierarchical clustering was also applied to similarity matrices computed by individual measures. An evaluation of the cluster sets according to an answer set revealed that VSM- and LDA-based similarity measures performed best.
https://doi.org/10.1633/JISTaP.2020.8.2.1 인용 PDF KSCI HTML

A Max-Flow-Based Similarity Measure for Spectral Clustering

Cao, Jiangzhong;Chen, Pei;Zheng, Yun;Dai, Qingyun
- ETRI Journal
- /
- 제35권2호
- /
- pp.311-320
- /
- 2013
In most spectral clustering approaches, the Gaussian kernel-based similarity measure is used to construct the affinity matrix. However, such a similarity measure does not work well on a dataset with a nonlinear and elongated structure. In this paper, we present a new similarity measure to deal with the nonlinearity issue. The maximum flow between data points is computed as the new similarity, which can satisfy the requirement for similarity in the clustering method. Additionally, the new similarity carries the global and local relations between data. We apply it to spectral clustering and compare the proposed similarity measure with other state-of-the-art methods on both synthetic and real-world data. The experiment results show the superiority of the new similarity: 1) The max-flow-based similarity measure can significantly improve the performance of spectral clustering; 2) It is robust and not sensitive to the parameters.
https://doi.org/10.4218/etrij.13.0112.0520 인용 PDF KSCI

A NOTE ON APPROXIMATE SIMILARITY

Hadwin, Don
- 대한수학회지
- /
- 제38권6호
- /
- pp.1157-1166
- /
- 2001
This paper answers some old questions about approximate similarity and raises new ones. We provide positive evidence and a technique for finding negative evidence on the question of whether approximate similarity is the equivalence relation generated by approximate equivalence and similarity.
PDF

멀티모달 기반 악성코드 유사도 계산 기법 (Multi-Modal Based Malware Similarity Estimation Method)

유정도;김태규;김인성;김휘강
- 정보보호학회논문지
- /
- 제29권2호
- /
- pp.347-363
- /
- 2019
사람의 DNA가 변하지 않는 것과 같이 사이버상의 악성코드도 변하지 않는 고유의 행위 특징을 갖고 있다. APT(Advanced Persistent Threat) 공격에 대한 방어수단을 사전에 확보하기 위해서는 악성코드의 악성 행위 특징을 추출해야 한다. 이를 위해서는 먼저 악성코드 간의 유사도를 계산하여 유사한 악성코드끼리 분류할 수 있어야 한다. 본 논문에서는 Windows OS 상에서 동작하는 악성코드 간의 유사도 계산 방법으로 'TF-IDF 코사인 유사도', 'Nilsimsa 유사도', '악성코드 기능 유사도', 'Jaccard 유사도'를 사용해 악성코드의 유형을 예측해보고, 그 결과를 보인다. 실험결과, 유사도 계산 방식마다 악성코드 유형에 따라 예측률의 차이가 매우 컸음을 발견할 수 있었다. 모든 결과에 월등한 정확도를 보인 유사도는 존재하지 않았으나, 본 실험결과를 이용하여 특정 패밀리의 악성코드를 분류할 때 어떤 유사도 계산 방식을 활용하는 것이 상대적으로 유리할지를 결정할 때 도움이 될 것으로 판단된다.
https://doi.org/10.13089/JKIISC.2019.29.2.347 인용 PDF KSCI HTML

A Study on the Performance of Similarity Indices and its Relationship with Link Prediction: a Two-State Random Network Case

Ahn, Min-Woo;Jung, Woo-Sung
- Journal of the Korean Physical Society
- /
- 제73권10호
- /
- pp.1589-1595
- /
- 2018
Similarity index measures the topological proximity of node pairs in a complex network. Numerous similarity indices have been defined and investigated, but the dependency of structure on the performance of similarity indices has not been sufficiently investigated. In this study, we investigated the relationship between the performance of similarity indices and structural properties of a network by employing a two-state random network. A node in a two-state network has binary types that are initially given, and a connection probability is determined from the state of the node pair. The performances of similarity indices are affected by the number of links and the ratio of intra-connections to inter-connections. Similarity indices have different characteristics depending on their type. Local indices perform well in small-size networks and do not depend on whether the structure is intra-dominant or inter-dominant. In contrast, global indices perform better in large-size networks, and some such indices do not perform well in an inter-dominant structure. We also found that link prediction performance and the performance of similarity are correlated in both model networks and empirical networks. This relationship implies that link prediction performance can be used as an approximation for the performance of the similarity index when information about node type is unavailable. This relationship may help to find the appropriate index for given networks.
https://doi.org/10.3938/jkps.73.1589 인용 KSCI

Similarity Classifier based on Schweizer & Sklars t-norms

Luukka, P.;Sampo, J.
- 제어로봇시스템학회:학술대회논문집
- /
- 제어로봇시스템학회 2004년도 ICCAS
- /
- pp.1053-1056
- /
- 2004
In this article we have applied Schweizer & Sklars t-norm based similarity measures to classification task. We will compare results to fuzzy similarity measure based classification and show that sometimes better results can be found by using these measures than fuzzy similarity measure. We will also show that classification results are not so sensitive to p values with Schweizer & Sklars measures than when fuzzy similarity is used. This is quite important when one does not have luxury of tuning these kind of parameters but needs good classification results fast.
PDF

Grouping DNA sequences with similarity measure and application

Lee, Sanghyuk
- 한국융합학회논문지
- /
- 제4권3호
- /
- pp.35-41
- /
- 2013
Grouping problem with similarities between DNA sequences are studied. The similaritymeasure and the distance measure showed the complementary characteristics. Distance measure can be obtained by complementing similarity measure, and vice versa. Similarity measure is derived and proved. Usefulness of the proposed similarity measure is applied to grouping problem of 25 cockroach DNA sequences. By calculation of DNA similarity, 25 cockroaches are clustered by four groups, and the results are compared with the previous neighbor-joining method.
https://doi.org/10.15207/JKCS.2013.4.3.035 인용 PDF

아이템의 유사도를 고려한 트랜잭션 클러스터링 (Transactions Clustering based on Item Similarity)

이상욱;김재련
- 한국지능정보시스템학회:학술대회논문집
- /
- 한국지능정보시스템학회 2002년도 추계정기학술대회
- /
- pp.250-257
- /
- 2002
Clustering is a data mining method, which consists in discovering interesting data distributions in very large databases. In traditional data clustering, similarity of a cluster of object is measured by pairwise similarity of objects in that paper. In view of the nature of clustering transactions, we devise in this paper a novel measurement called item similarity and utilize this to perform clustering. With this item similarity measurement, we develop an efficient clustering algorithm for target marketing in each group.
PDF

SIMILAR AND SELF-SIMILAR CURVES IN MINKOWSKI n-SPACE

OZDEMIR, MUSTAFA;SIMSEK, HAKAN
- 대한수학회보
- /
- 제52권6호
- /
- pp.2071-2093
- /
- 2015
In this paper, we investigate the similarity transformations in the Minkowski n-space. We study the geometric invariants of non-null curves under the similarity transformations. Besides, we extend the fundamental theorem for a non-null curve according to a similarity motion of ${\mathbb{E}}_1^n$. We determine the parametrizations of non-null self-similar curves in ${\mathbb{E}}_1^n$.
https://doi.org/10.4134/BKMS.2015.52.6.2071 인용 PDF KSCI

Similarity Measure Construction with Fuzzy Entropy and Distance Measure

Lee Sang-Hyuk
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- 제5권4호
- /
- pp.367-371
- /
- 2005
The similarity measure is derived using fuzzy entropy and distance measure. By the elations of fuzzy entropy, distance measure, and similarity measure, we first obtain the fuzzy entropy. And with both fuzzy entropy and distance measure, similarity measure is obtained., We verify that the proposed measure become the similarity measure.
https://doi.org/10.5391/IJFIS.2005.5.4.367 인용 PDF KSCI

검색결과 8,094건 처리시간 0.037초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)