• 제목/요약/키워드: vector similarity

검색결과 371건 처리시간 0.025초

On the Study of Perfect Coverage for Recommender System

  • Lee, Hee-Choon;Lee, Seok-Jun
    • Journal of the Korean Data and Information Science Society
    • /
    • 제17권4호
    • /
    • pp.1151-1160
    • /
    • 2006
  • The similarity weight, the pearson's correlation coefficient, which is used in the recommender system has a weak point that it cannot predict all of the prediction value. The similarity weight, the vector similarity, has a weak point of the high MAE although the prediction coverage using the vector similarity is higher than that using the pearson's correlation coefficient. The purpose of this study is to suggest how to raise the prediction coverage. Also, the MAE using the suggested method in this study was compared both with the MAE using the pearson's correlation coefficient and with the MAE using the vector similarity, so was the prediction coverage. As a result, it was found that the low of the MAE in the case of using the suggested method was higher than that using the pearson's correlation coefficient. However, it was also shown that it was lower than that using the vector similarity. In terms of the prediction coverage, when the suggested method was compared with two similarity weights as I mentioned above, it was found that its prediction coverage was higher than that pearson's correlation coefficient as well as vector similarity.

  • PDF

A Study on the Maximizing Coverage for Recommender System

  • 이희춘;이석준;박지원;김철승
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 한국데이터정보과학회 2006년도 추계 학술발표회 논문집
    • /
    • pp.119-128
    • /
    • 2006
  • The similarity weight, the pearson's correlation coefficient, which is used in the recommender system has a weak point that it cannot predict all of the prediction value. The similarity weight, the vector similarity, has a weak point of the high MAE although the prediction coverage using the vector similarity is higher than that using the pearson's correlation coefficient. The purpose of this study is to suggest how to raise the prediction coverage. Also, the MAE using the suggested method in this study was compared both with the MAE using the pearson's correlation coefficient and with the MAE using the vector similarity, so was the prediction coverage. As a result, it was found that the low of the MAE in the case of using the suggested method was higher than that using the pearson's correlation coefficient. However, it was also shown that it was lower than that using the vector similarity In terms of the prediction coverage, when the suggested method was compared with two similarity weights as I mentioned above, it was found that its prediction coverage was higher than that pearson's correlation coefficient as well as vector similarity.

  • PDF

SSF: Sentence Similar Function Based on word2vector Similar Elements

  • Yuan, Xinpan;Wang, Songlin;Wan, Lanjun;Zhang, Chengyuan
    • Journal of Information Processing Systems
    • /
    • 제15권6호
    • /
    • pp.1503-1516
    • /
    • 2019
  • In this paper, to improve the accuracy of long sentence similarity calculation, we proposed a sentence similarity calculation method based on a system similarity function. The algorithm uses word2vector as the system elements to calculate the sentence similarity. The higher accuracy of our algorithm is derived from two characteristics: one is the negative effect of penalty item, and the other is that sentence similar function (SSF) based on word2vector similar elements doesn't satisfy the exchange rule. In later studies, we found the time complexity of our algorithm depends on the process of calculating similar elements, so we build an index of potentially similar elements when training the word vector process. Finally, the experimental results show that our algorithm has higher accuracy than the word mover's distance (WMD), and has the least query time of three calculation methods of SSF.

특징점간의 벡터 유사도 정합을 이용한 손가락 관절문 인증 (Finger-Knuckle-Print Verification Using Vector Similarity Matching of Keypoints)

  • 김민기
    • 한국멀티미디어학회논문지
    • /
    • 제16권9호
    • /
    • pp.1057-1066
    • /
    • 2013
  • 손가락 관절문(FKP, finger-knuckle-print)을 이용한 개인 인증은 손가락 관절부에 나타나는 주름의 특징을 이용하는 것으로, 텍스처의 방향 정보가 중요한 특징이 된다. 본 논문에서는 SIFT 알고리즘을 이용하여 특징점들을 추출하고, 벡터 유사도 정합을 통해 FKP를 효과적으로 인증할 수 있는 방법을 제안하다. 벡터는 질의 영상에서 추출한 특징점과 이에 대응되는 참조 영상의 특징점을 연결하는 방향 벡터로 정의된다. 국소적인 특징점 쌍으로부터 방향 벡터를 생성하기 때문에 방향 벡터 자체는 국소적인 특징만을 나타내지만, 두 영상 간에 존재하는 다른 벡터들 간의 유사도를 비교함으로써 전역적인 특징으로 확장되는 장점이 있다. 실험결과 제안하는 방법은 기존의 방향코드를 이용한 다양한 방식에 비하여 우수한 성능을 나타내었다.

시소러스 도구를 이용한 실시간 개념 기반 문서 분류 시스템 (A Real-Time Concept-Based Text Categorization System using the Thesauraus Tool)

  • 강원석;강현규
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제26권1호
    • /
    • pp.167-167
    • /
    • 1999
  • The majority of text categorization systems use the term-based classification method. However, because of too many terms, this method is not effective to classify the documents in areal-time environment. This paper presents a real-time concept-based text categorization system,which classifies texts using thesaurus. The system consists of a Korean morphological analyzer, athesaurus tool, and a probability-vector similarity measurer. The thesaurus tool acquires the meaningsof input terms and represents the text with not the term-vector but the concept-vector. Because theconcept-vector consists of semantic units with the small size, it makes the system enable to analyzethe text with real-time. As representing the meanings of the text, the vector supports theconcept-based classification. The probability-vector similarity measurer decides the subject of the textby calculating the vector similarity between the input text and each subject. In the experimentalresults, we show that the proposed system can effectively analyze texts with real-time and do aconcept-based classification. Moreover, the experiment informs that we must expand the thesaurustool for the better system.

분광 유사도 커널을 이용한 하이퍼스펙트럴 영상의 Support Vector Machine(SVM) 분류 (Support Vector Machine Classification of Hyperspectral Image using Spectral Similarity Kernel)

  • 최재완;변영기;김용일;유기윤
    • 대한공간정보학회지
    • /
    • 제14권4호통권38호
    • /
    • pp.71-77
    • /
    • 2006
  • 통계학습이론에 기반하고 있는 Support Vector Machine(SVM)은 구조적 위험 최소화원리를 바탕으로 하는 학습 알고리즘이다. 일반적으로SVM은 비선형 경계를 결정하고 자료를 분류하기 위해서 커널(kernel)을 사용한다. 그러나 기존의 커널들은 두 벡터간의 내적이나 거리차를 이용하여 유사도를 측정하기 때문에 하이퍼스펙트럴 영상분류에 효과적으로 적용될 수 없다. 본 논문에서는 이를 해결하기 위해서 분광유사도커널(Spectral similarity kernel)을 제안한다. 분광유사도 커널은 두 벡터의 거리차와 각 차이를 모두 계산하는 지역적 커널로 하이퍼스펙트럴 영상의 분광특성을 효과적으로 고려할 수 있다. 이를 검증하기 위해서 Hyperion 영상에 polynomial kernel, RBF kernel을 사용한 SVM 분류기와 분광유사도 커널을 사용한 SVM 분류기를 적용하여 토지피복분류를 시행하였다. 분류결과를 통해서 분광유사도 커널을 사용한 SVM 분류기가 정량적, 공간적으로 가장 우수한 결과를 보임을 확인하였다.

  • PDF

다중레벨 벡터양자화 기반의 유사도를 이용한 자동 음악요약 (Automatic Music Summarization Using Similarity Measure Based on Multi-Level Vector Quantization)

  • 김성탁;김상호;김회린
    • The Journal of the Acoustical Society of Korea
    • /
    • 제26권2E호
    • /
    • pp.39-43
    • /
    • 2007
  • Music summarization refers to a technique which automatically extracts the most important and representative segments in music content. In this paper, we propose and evaluate a technique which provides the repeated part in music content as music summary. For extracting a repeated segment in music content, the proposed algorithm uses the weighted sum of similarity measures based on multi-level vector quantization for fixed-length summary or optimal-length summary. For similarity measures, count-based similarity measure and distance-based similarity measure are proposed. The number of the same codeword and the Mahalanobis distance of features which have same codeword at the same position in segments are used for count-based and distance-based similarity measure, respectively. Fixed-length music summary is evaluated by measuring the overlapping ratio between hand-made repeated parts and automatically generated ones. Optimal-length music summary is evaluated by calculating how much automatically generated music summary includes repeated parts of the music content. From experiments we observed that optimal-length summary could capture the repeated parts in music content more effectively in terms of summary length than fixed-length summary.

대수적 특성을 고려한 벡터 유사도 측정 함수의 고찰 (Survey on Vector Similarity Measures : Focusing on Algebraic Characteristics)

  • 이동주;심준호
    • 한국전자거래학회지
    • /
    • 제17권4호
    • /
    • pp.209-219
    • /
    • 2012
  • 전자 상거래 시스템 환경에서 상품, 상품평, 사용자 특성 등은 주요한 정보 객체이다. 벡터는 객체의 표현기법으로 널리 사용되고 있다. 전자 상거래 데이터 객체들은 벡터로서 모델되어 각 특질에 해당하는 차원의 숫자 값으로 표현될 수 있다. 전자 상거래의 특성상 이러한 객체들은 방대한 분량이 되고 있고, 이중 여러 객체들은 실제로 같거나 유사한 객체일 수 있다. 따라서 객체간 유사도 측정은 전자상거래 시스템에서 중요한 역할을 한다. 본 논문에서는 벡터 객체에서 사용되는 대표적인 유사도 측정 함수들을 고찰한다. 유사 함수들은 각각의 대수적 특성을 가지고 있고 서로 연결된 특성을 보인다. 이러한 특성을 분석하고 또한 유사 함수들을 분류해 본다. 이러한 과정은 표준 벡터 유사도 함수가 가져야 할 대수적 특성을 제시해준다.

벡터 공간 모델과 HAL에 기초한 단어 의미 유사성 군집 (Word Sense Similarity Clustering Based on Vector Space Model and HAL)

  • 김동성
    • 인지과학
    • /
    • 제23권3호
    • /
    • pp.295-322
    • /
    • 2012
  • 본 연구에서는 벡터 공간 모델과 HAL (Hyperspace Analog to Language)을 적용해서 단어 의미 유사성을 군집한다. 일정한 크기의 문맥을 통해서 단어 간의 상관성을 측정하는 HAL을 도입하고(Lund and Burgess 1996), 상관성 측정에서 고빈도와 저빈도에 다르게 측정되는 왜곡을 줄이기 위해서 벡터 공간 모델을 적용해서 단어 쌍의 코사인 유사도를 측정하였다(Salton et al. 1975, Widdows 2004). HAL과 벡터 공간 모델로 만들어지는 공간은 다차원이므로, 차원을 축소하기 위해서 PCA (Principal Component Analysis)와 SVD (Singular Value Decomposition)를 적용하였다. 유사성 군집을 위해서 비감독 방식과 감독 방식을 적용하였는데, 비감독 방식에는 클러스터링을 감독 방식에는 SVM (Support Vector Machine), 나이브 베이즈 구분자(Naive Bayes Classifier), 최대 엔트로피(Maximum Entropy) 방식을 적용하였다. 이 연구는 언어학적 측면에서 Harris (1954), Firth (1957)의 분포 가설(Distributional Hypothesis)을 활용한 의미 유사도를 측정하였으며, 심리언어학적 측면에서 의미 기억을 설명하기 위한 모델로 벡터 공간 모델과 HAL을 결합하였으며, 전산적 언어 처리 관점에서 기계학습 방식 중 감독 기반과 비감독 기반을 적용하였다.

  • PDF

Empirical Comparison of Word Similarity Measures Based on Co-Occurrence, Context, and a Vector Space Model

  • Kadowaki, Natsuki;Kishida, Kazuaki
    • Journal of Information Science Theory and Practice
    • /
    • 제8권2호
    • /
    • pp.6-17
    • /
    • 2020
  • Word similarity is often measured to enhance system performance in the information retrieval field and other related areas. This paper reports on an experimental comparison of values for word similarity measures that were computed based on 50 intentionally selected words from a Reuters corpus. There were three targets, including (1) co-occurrence-based similarity measures (for which a co-occurrence frequency is counted as the number of documents or sentences), (2) context-based distributional similarity measures obtained from a latent Dirichlet allocation (LDA), nonnegative matrix factorization (NMF), and Word2Vec algorithm, and (3) similarity measures computed from the tf-idf weights of each word according to a vector space model (VSM). Here, a Pearson correlation coefficient for a pair of VSM-based similarity measures and co-occurrence-based similarity measures according to the number of documents was highest. Group-average agglomerative hierarchical clustering was also applied to similarity matrices computed by individual measures. An evaluation of the cluster sets according to an answer set revealed that VSM- and LDA-based similarity measures performed best.