• Title/Summary/Keyword: Statistical similarity

Search Result 311, Processing Time 0.032 seconds

Statistical Fingerprint Recognition Matching Method with an Optimal Threshold and Confidence Interval

  • Hong, C.S.;Kim, C.H.
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.6
    • /
    • pp.1027-1036
    • /
    • 2012
  • Among various biometrics recognition systems, statistical fingerprint recognition matching methods are considered using minutiae on fingerprints. We define similarity distance measures based on the coordinate and angle of the minutiae, and suggest a fingerprint recognition model following statistical distributions. We could obtain confidence intervals of similarity distance for the same and different persons, and optimal thresholds to minimize two kinds of error rates for distance distributions. It is found that the two confidence intervals of the same and different persons are not overlapped and that the optimal threshold locates between two confidence intervals. Hence an alternative statistical matching method can be suggested by using nonoverlapped confidence intervals and optimal thresholds obtained from the distributions of similarity distances.

Some new similarity based approaches in approximate reasoning and their applications to pattern recognition

  • Swapan Raha;Nikhil R. Pal;Ray, Kumar-Sankar
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 1998.06a
    • /
    • pp.719-724
    • /
    • 1998
  • This paper presents a systematic developement of a formal approach to inference in approximate reasoning. We introduce some measures of similarity and discuss their properties. Using the concept of similarity index we formulate two methods for inferring from vague knowledge. In order to illustrate the effectiveness of the proposed technique we use it to develop a vowel recognition system.

  • PDF

The Strength of the Relationship between Semantic Similarity and the Subcategorization Frames of the English Verbs: a Stochastic Test based on the ICE-GB and WordNet (영어 동사의 의미적 유사도와 논항 선택 사이의 연관성 : ICE-GB와 WordNet을 이용한 통계적 검증)

  • Song, Sang-Houn;Choe, Jae-Woong
    • Language and Information
    • /
    • v.14 no.1
    • /
    • pp.113-144
    • /
    • 2010
  • The primary goal of this paper is to find a feasible way to answer the question: Does the similarity in meaning between verbs relate to the similarity in their subcategorization? In order to answer this question in a rather concrete way on the basis of a large set of English verbs, this study made use of various language resources, tools, and statistical methodologies. We first compiled a list of 678 verbs that were selected from the most and second most frequent word lists from the Colins Cobuild English Dictionary, which also appeared in WordNet 3.0. We calculated similarity measures between all the pairs of the words based on the 'jcn' algorithm (Jiang and Conrath, 1997) implemented in the WordNet::Similarity module (Pedersen, Patwardhan, and Michelizzi, 2004). The clustering process followed, first building similarity matrices out of the similarity measure values, next drawing dendrograms on the basis of the matricies, then finally getting 177 meaningful clusters (covering 437 verbs) that passed a certain level set by z-score. The subcategorization frames and their frequency values were taken from the ICE-GB. In order to calculate the Selectional Preference Strength (SPS) of the relationship between a verb and its subcategorizations, we relied on the Kullback-Leibler Divergence model (Resnik, 1996). The SPS values of the verbs in the same cluster were compared with each other, which served to give the statistical values that indicate how much the SPS values overlap between the subcategorization frames of the verbs. Our final analysis shows that the degree of overlap, or the relationship between semantic similarity and the subcategorization frames of the verbs in English, is equally spread out from the 'very strongly related' to the 'very weakly related'. Some semantically similar verbs share a lot in terms of their subcategorization frames, and some others indicate an average degree of strength in the relationship, while the others, though still semantically similar, tend to share little in their subcategorization frames.

  • PDF

Clustering-based Statistical Machine Translation Using Syntactic Structure and Word Similarity (문장구조 유사도와 단어 유사도를 이용한 클러스터링 기반의 통계기계번역)

  • Kim, Han-Kyong;Na, Hwi-Dong;Li, Jin-Ji;Lee, Jong-Hyeok
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.4
    • /
    • pp.297-304
    • /
    • 2010
  • Clustering method which based on sentence type or document genre is a technique used to improve translation quality of SMT(statistical machine translation) by domain-specific translation. But there is no previous research using sentence type and document genre information simultaneously. In this paper, we suggest an integrated clustering method that classifying sentence type by syntactic structure similarity and document genre by word similarity information. We interpolated domain-specific models from clusters with general models to improve translation quality of SMT system. Kernel function and cosine measures are applied to calculate structural similarity and word similarity. With these similarities, we used machine learning algorithms similar to K-means to clustering. In Japanese-English patent translation corpus, we got 2.5% point relative improvements of translation quality at optimal case.

Statistical Consideration on the Similarity in Dissolution Profile of Two Fast Releasing Tablets (속용성 정제간의 용출유사성에 대한 통계학적 고찰)

  • Cho, Jung-Hwan;Lee, Se-Hee;Kim, Hee-Sun;Oh, Seaung-Youl
    • Journal of Pharmaceutical Investigation
    • /
    • v.30 no.2
    • /
    • pp.85-91
    • /
    • 2000
  • We have studied the dissolution kinetics of two fast releasing tablets in four media, and the similarity of dissolution profiles was compared using 3 methods. Two of the methods were introduced from statistical algorithm of distance methods, which are maximum distance and Mahalanobis distance. The dissolution kinetics were also analysed using FDA method for similarity evaluation, and the results were compared with those obtained using the distance methods.

  • PDF

Reliable Data Selection using Similarity Measure (유사측도를 이용한 신뢰성 있는 데이터의 추출)

  • Ryu, Soo-Rok;Lee, Sang-Hyuk
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.2
    • /
    • pp.200-205
    • /
    • 2008
  • For data analysis, fuzzy entropy is introduced as the measure of fuzziness, similarity measure is also constructed to represent similarity between data. Similarity measure between fuzzy membership functions is constructed through distance measure, and the proposed similarity measure are proved. Application of proposed similarity measure to the example of reliable data selection is also carried out. Application results are compared with the previous results that is obtained through fuzzy entropy and statistical knowledge.

A New Statistical Index for Detecting Cheaters on Multiple Choice Tests (다중선택 시험에서 부정행위자 발견을 위한 새로운 통계적 측도)

  • Han, Eun Su;Lim, Johan;Lee, Kyeong Eun
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.1
    • /
    • pp.81-92
    • /
    • 2013
  • It is important to construct a firm basis for accusing potential violators of academic integrity in order to avoid spurious accusations and false convictions. Educational researchers have developed many statistical methods that can either uncover or confirm cases of cheating on tests. However, most of them rely on simple correlation-based measures, and often fail to account for patterns in responses or answers. In this paper, we propose a new statistical index denoted by a Standardized Signed Entropy Similarity Score to resolve this difficulty. In addition, we apply the proposed method to analyze a real data set and compare the results to other existing methods.

Recovery Levels of Clustering Algorithms Using Different Similarity Measures for Functional Data

  • Chae, Seong San;Kim, Chansoo;Warde, William D.
    • Communications for Statistical Applications and Methods
    • /
    • v.11 no.2
    • /
    • pp.369-380
    • /
    • 2004
  • Clustering algorithms with different similarity measures are commonly used to find an optimal clustering or close to original clustering. The recovery level of using Euclidean distance and distances transformed from correlation coefficients is evaluated and compared using Rand's (1971) C statistic. The C values present how the resultant clustering is close to the original clustering. In simulation study, the recovery level is improved by applying the correlation coefficients between objects. Using the data set from Spellman et al. (1998), the recovery levels with different similarity measures are also presented. In general, the recovery level of true clusters was increased by using the correlation coefficients.

Development of a New Similarity Index to Compare Time-series Profile Data for Animal and Human Experiments (동물 및 임상 시험의 시계열 프로파일 데이터 비교를 위한 유사성 지수 개발)

  • Lee, Ye Gyoung;Lee, Hyun Jeong;Jang, Hyeon Ae;Shin, Sangmun
    • Journal of Korean Society for Quality Management
    • /
    • v.49 no.2
    • /
    • pp.145-159
    • /
    • 2021
  • Purpose: A statistical similarity evaluation to compare pharmacokinetics(PK) profile data between nonclinical and clinical experiments has become a significant issue on many drug development processes. This study proposes a new similarity index by considering important parameters, such as the area under the curve(AUC) and the time-series profile of various PK data. Methods: In this study, a new profile similarity index(PSI) by using the concept of a process capability index(Cp) is proposed in order to investigate the most similar animal PK profile compared to the target(i.e., Human PK profile). The proposed PSI can be calculated geometric and arithmetic means of all short term similarity indices at all time points on time-series both animal and human PK data. Designed simulation approaches are demonstrated for a verification purpose. Results: Two different simulation studies are conducted by considering three variances(i.e., small, medium, and large variances) as well as three different characteristic types(smaller the better, larger the better, nominal the best). By using the proposed PSI, the most similar animal PK profile compare to the target human PK profile can be obtained in the simulation studies. In addition, a case study represents differentiated results compare to existing simple statistical analysis methods(i.e., root mean squared error and quality loss). Conclusion: The proposed PSI can effectively estimate the level of similarity between animal, human PK profiles. By using these PSI results, we can reduce the number of animal experiments because we only focus on the significant animal representing a high PSI value.

Semi-supervised learning using similarity and dissimilarity

  • Seok, Kyung-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.1
    • /
    • pp.99-105
    • /
    • 2011
  • We propose a semi-supervised learning algorithm based on a form of regularization that incorporates similarity and dissimilarity penalty terms. Our approach uses a graph-based encoding of similarity and dissimilarity. We also present a model-selection method which employs cross-validation techniques to choose hyperparameters which affect the performance of the proposed method. Simulations using two types of dat sets demonstrate that the proposed method is promising.