• 제목/요약/키워드: Similarity Metrics

검색결과 76건 처리시간 0.023초

A Novel Similarity Measure for Sequence Data

  • Pandi, Mohammad. H.;Kashefi, Omid;Minaei, Behrouz
    • Journal of Information Processing Systems
    • /
    • 제7권3호
    • /
    • pp.413-424
    • /
    • 2011
  • A variety of different metrics has been introduced to measure the similarity of two given sequences. These widely used metrics are ranging from spell correctors and categorizers to new sequence mining applications. Different metrics consider different aspects of sequences, but the essence of any sequence is extracted from the ordering of its elements. In this paper, we propose a novel sequence similarity measure that is based on all ordered pairs of one sequence and where a Hasse diagram is built in the other sequence. In contrast with existing approaches, the idea behind the proposed sequence similarity metric is to extract all ordering features to capture sequence properties. We designed a clustering problem to evaluate our sequence similarity metric. Experimental results showed the superiority of our proposed sequence similarity metric in maximizing the purity of clustering compared to metrics such as d2, Smith-Waterman, Levenshtein, and Needleman-Wunsch. The limitation of those methods originates from some neglected sequence features, which are considered in our proposed sequence similarity metric.

유사인자를 사용하여 용출양상 유사성을 비교하는 방법에 대한 고찰 (Understanding of F2 Metrics Used to Evaluate Similarity of Dissolution Profiles)

  • 조미현;김정호;이현태;사홍기
    • Journal of Pharmaceutical Investigation
    • /
    • 제33권3호
    • /
    • pp.245-253
    • /
    • 2003
  • Dissolution profile comparsions can be done by virtue of the similarity factor $(f_2)$. It is a logarithmic reciprocal square root transformation of the sum of squared error of % dissolution differences between two profiles at several time points. It gives information on the degree of similarity between the two profiles: An $f_2$ value between 50 and 100 suggests the similarity/equivalence of the two dissolution curves being compared. The objective of this report was to provide a careful examination on the $f_2$ metrics in detail. It was shown that $f_2$ values exceeded 50, when relative differences in % dissolved between two products were less than 15% at all time points. The similarity factor value was also found to be greater than 50, in cases when absolute % dissolution differences were below 10% at all time points. Interestingly, the $f_2$ value was changed by the number of the time points selected for calculation. In particular, $f_2$ tended to have higher values, when the $f_2$ metrics used a large number of time points in which % dissolved reached plateau. Finally, since the similarity factor was a sample statistics, it was impossible to infer type I/II errors and sampling error. Despite certain limitations inherited in the $f_2$ metrics, it was easy and convenient to evaluate how similar the two dissolution profiles were.

효과적인 웹 문서 변경도 측정 방법 (An Effective Metric for Measuring the Degree of Web Page Changes)

  • 권신영;김성진;이상호
    • 한국정보과학회논문지:데이타베이스
    • /
    • 제34권5호
    • /
    • pp.437-447
    • /
    • 2007
  • 다양한 유사도 측정 방법들이 웹 문서의 변경도 측정에 사용되어 왔다. 본 논문은 여섯 가지 웹 문서 변경 종류에 근거하여 변경도 측정 방법의 효과성 평가 척도를 정의하고, 새로운 유사도 측정 방법을 제안한다. 실제 웹 문서들과 인위적 문서들을 사용하여, 기존의 다섯 가지 측정 방법들(바이트 비교, TF IDF 코사인 거리, 단어 거리, 편집 거리, 슁글링)과 제안된 측정 방법을 비교 평가한다. 실험 결과 분석을 통해 제안된 측정 방법이 웹 문서의 변경 측정에 효과적임을 보인다. 본 연구는 웹 문서의 변화 정보를 필요로 하는 웹 응용 분야에서 웹 문서 변경도 측정 방법의 적합한 선택을 위한 지침이 될 수 있다.

문맥가중치가 반영된 문장 유사 척도 (Context-Weighted Metrics for Example Matching)

  • 김동주;김한우
    • 전자공학회논문지CI
    • /
    • 제43권6호
    • /
    • pp.43-51
    • /
    • 2006
  • 본 논문은 영한 기계번역을 위한 예제기반 기계번역에서 예제 문장의 비교를 위한 척도에 관한 것으로 주어진 질의 문장과 가장 유사한 예제 문장을 찾아내는데 사용되는 유사성 척도를 제안한다. 제안하는 척도는 편집거리 알고리즘에 기반을 둔 것으로 표면어가 일치하지 않는 단어에 대해 기본적으로 단어의 표제어 정보와 품사 정보를 이용하여 유사도를 계산한다. 편집거리 척도는 비교 단위의 순서에 의존적이기는 하지만 순서만 일치하면 동일한 유사성 기여도를 갖는 것으로 판단하기 때문에 완전 문맥을 반영하지는 못한다. 따라서 본 논문에서는 완전 문맥 반영을 위해 추가적으로 이들 정보 외에 일치하는 단위 정보를 갖는 연속된 단어들에 대해 연속 정보를 반영한 문맥 가중치를 제안한다. 또한 비유사성 정도를 의미하는 척도인 편집거리 척도를 유사성 척도로 변경하고, 문맥 가중치가 적용된 척도를 문장 비교에 적용하기 위하여 정규화를 수행하며, 이를 통하여 유사도에 따른 순위를 결정한다. 또한 언어적 정보를 이용한 기존 방법류들에 대한 일반화를 시도하였으며, 문맥 가중치가 적용된 척도의 우수성을 증명하기 위해 일반화된 기존 방법류들과의 비교 실험을 수행하였다.

Improving Performance of Jaccard Coefficient for Collaborative Filtering

  • Lee, Soojung
    • 한국컴퓨터정보학회논문지
    • /
    • 제21권11호
    • /
    • pp.121-126
    • /
    • 2016
  • In recommender systems based on collaborative filtering, measuring similarity is very critical for determining the range of recommenders. Data sparsity problem is fundamental in collaborative filtering systems, which is partly solved by Jaccard coefficient combined with traditional similarity measures. This study proposes a new coefficient for improving performance of Jaccard coefficient by compensating for its drawbacks. We conducted experiments using datasets of various characteristics for performance analysis. As a result of comparison between the proposed and the similarity metric of Pearson correlation widely used up to date, it is found that the two metrics yielded competitive performance on a dense dataset while the proposed showed much better performance on a sparser dataset. Also, the result of comparing the proposed with Jaccard coefficient showed that the proposed yielded far better performance as the dataset is denser. Overall, the proposed coefficient demonstrated the best prediction and recommendation performance among the experimented metrics.

Using User Rating Patterns for Selecting Neighbors in Collaborative Filtering

  • Lee, Soojung
    • 한국컴퓨터정보학회논문지
    • /
    • 제24권9호
    • /
    • pp.77-82
    • /
    • 2019
  • Collaborative filtering is a popular technique for recommender systems and used in many practical commercial systems. Its basic principle is select similar neighbors of a current user and from their past preference information on items the system makes recommendations for the current user. One of the major problems inherent in this type of system is data sparsity of ratings. This is mainly caused from the underlying similarity measures which produce neighbors based on the ratings records. This paper handles this problem and suggests a new similarity measure. The proposed method takes users rating patterns into account for computing similarity, without just relying on the commonly rated items as in previous measures. Performance experiments of various existing measures are conducted and their performance is compared in terms of major performance metrics. As a result, the proposed measure reveals better or comparable achievements in all the metrics considered.

컴포넌트 설계에 대한 응집도와 결합도 메트릭스 (Cohesion and Coupling Metrics for Component Design Model)

  • 고병선;박재년
    • 정보처리학회논문지D
    • /
    • 제10D권5호
    • /
    • pp.745-752
    • /
    • 2003
  • 소프트웨어 개발의 독립성과 높은 생산성을 향상시키기 위한 재사용 기술로 컴포넌트 기반 개발 방법론은 널리 사용되게 되었다. 소프트웨어의 품질을 향상시키기 위해서는 측정 가능해야 하므로, 컴포넌트의 특성을 반영한 컴포넌트 메트릭스가 필요하다. 따라서 본 논문에서는 컴포넌트 기반 시스템의 컴포넌트 설계 정보에 기반한 컴포넌트 응집도와 결합도 메트릭스를 제안한다. 오퍼레이션이 컴포넌트의 서비스를 제공하기 위해 공통으로 사용하는 클래스에 대한 정보를 이용해 오퍼레이션 사용도를 구하고, 이를 통해 오퍼레이션 유사도를 구한다. 컴포넌트 응집도와 결합도는 오퍼레이션 유사도에 의해 계산되며, 컴포넌트 분석 단계에 추출 가능한 정보로부터 계산된다. 그리고 사례 연구를 통해 컴포넌트 메트릭스의 필요성을 객체지향 메트릭스와의 비교를 통해 살펴본다.

Siamese 네트워크 기반 SAR 표적영상 간 유사도 분석 (Similarity Analysis Between SAR Target Images Based on Siamese Network)

  • 박지훈
    • 한국군사과학기술학회지
    • /
    • 제25권5호
    • /
    • pp.462-475
    • /
    • 2022
  • Different from the field of electro-optical(EO) image analysis, there has been less interest in similarity metrics between synthetic aperture radar(SAR) target images. A reliable and objective similarity analysis for SAR target images is expected to enable the verification of the SAR measurement process or provide the guidelines of target CAD modeling that can be used for simulating realistic SAR target images. For this purpose, this paper presents a similarity analysis method based on the siamese network that quantifies the subjective assessment through the distance learning of similar and dissimilar SAR target image pairs. The proposed method is applied to MSTAR SAR target images of slightly different depression angles and the resultant metrics are compared and analyzed with qualitative evaluation. Since the image similarity is somewhat related to recognition performance, the capacity of the proposed method for target recognition is further checked experimentally with the confusion matrix.

Improved Collaborative Filtering Using Entropy Weighting

  • Kwon, Hyeong-Joon
    • International Journal of Advanced Culture Technology
    • /
    • 제1권2호
    • /
    • pp.1-6
    • /
    • 2013
  • In this paper, we evaluate performance of existing similarity measurement metric and propose a novel method using user's preferences information entropy to reduce MAE in memory-based collaborative recommender systems. The proposed method applies a similarity of individual inclination to traditional similarity measurement methods. We experiment on various similarity metrics under different conditions, which include an amount of data and significance weighting from n/10 to n/60, to verify the proposed method. As a result, we confirm the proposed method is robust and efficient from the viewpoint of a sparse data set, applying existing various similarity measurement methods and Significance Weighting.

  • PDF

Underwater Optical Image Data Transmission in the Presence of Turbulence and Attenuation

  • Ramavath Prasad Naik;Maaz Salman;Wan-Young Chung
    • 융합신호처리학회논문지
    • /
    • 제24권1호
    • /
    • pp.1-14
    • /
    • 2023
  • Underwater images carry information that is useful in the fields of aquaculture, underwater military security, navigation, transportation, and so on. In this research, we transmitted an underwater image through various underwater mediums in the presence of underwater turbulence and beam attenuation effects using a high-speed visible optical carrier signal. The optical beam undergoes scintillation because of the turbulence and attenuation effects; therefore, distorted images were observed at the receiver end. To understand the behavior of the communication media, we obtained the bit error rate (BER) performance of the system with respect to the average signal-to-noise ratio (SNR). Also, the structural similarity index (SSI) and peak SNR (PSNR) metrics of the received image were evaluated. Based on the received images, we employed suitable nonlinear filters to recover the distorted images and enhance them further. The BER, SSI, and PSNR metrics of the specific nonlinear filters were also evaluated and compared with the unfiltered metrics. These metrics were evaluated using the on-off keying and binary phase-shift keying modulation techniques for the 50-m and 100-m links for beam attenuation resulting from pure seawater, clear ocean water, and coastal ocean water mediums.