• 제목/요약/키워드: Statistics Matching

검색결과 184건 처리시간 0.021초

Theoretical Peptide Mass Distribution in the Non-Redundant Protein Database of the NCBI

  • Lim Da-Jeong;Oh Hee-Seok;Kim Hee-Bal
    • Genomics & Informatics
    • /
    • 제4권2호
    • /
    • pp.65-70
    • /
    • 2006
  • Peptide mass mapping is the matching of experimentally generated peptides masses with the predicted masses of digested proteins contained in a database. To identify proteins by matching their constituent fragment masses to the theoretical peptide masses generated from a protein database, the peptide mass fingerprinting technique is used for the protein identification. Thus, it is important to know the theoretical mass distribution of the database. However, few researches have reported the peptide mass distribution of a database. We analyzed the peptide mass distribution of non-redundant protein sequence database in the NCBI after digestion with 15 different types of enzymes. In order to characterize the peptide mass distribution with different digestion enzymes, a power law distribution (Zipfs law) was applied to the distribution. After constructing simulated digestion of a protein database, rank-frequency plot of peptide fragments was applied to generalize a Zipfs law curve for all enzymes. As a result, our data appear to fit Zipfs law with statistically significant parameter values.

대응점 및 히스토그램을 이용한 영상 간의 컬러 차이 측정 기법 (Method of Measuring Color Difference Between Images using Corresponding Points and Histograms)

  • 황영배;김제우;최병호
    • 방송공학회논문지
    • /
    • 제17권2호
    • /
    • pp.305-315
    • /
    • 2012
  • 두 카메라 혹은 다수의 카메라에서의 컬러 보정은 이후 알고리즘의 성능 향상 및 양안식 3D 카메라에서 매우 중요한 기술이다. 최근 컬러 보정 방법들이 다수 제안되었지만 이 방법들의 결과에 대한 정확한 측정 방법이 많지 않으며 기존의 측정 방법은 두 영상이 카메라의 위치에 따른 서로 다른 장면을 가지고 있을 경우 적합하지 않을 수 있다. 본 논문에서는 컬러 보정을 위한 컬러 간의 차이 측정 기법을 제안한다. 이 기법은 대상이 되는 두 영상의 장면이 일치하지 않는 경우를 고려하여 대응점 검색을 통해 두 장면간의 같은 컬러를 가져야 하는 대응점을 찾고 이 대응점 주위의 영역으로부터 통계치를 계산하여 컬러의 차이를 계산한다. 이 경우 두 영상의 위치 변화를 하나의 기하학적 변환으로 설명하는 기존 방법에서 생길 수 있는 대응점간의 불일치를 고려할 수 있다. 또한 대응점들이 영상의 모든 영역을 포함하지 않을 수 있기 때문에 전체 영상의 통계치를 계산하여 컬러의 차이를 측정한다. 최종적인 컬러의 차이는 대응점 기반과 전체 영상 기반의 컬러 차이의 가중치의 합으로 결정되며 이 가중치는 대응점 기반의 컬러 비교가 영상 내의 얼마만큼의 영역을 포함하는지에 따라서 결정된다.

후향적연구의 집단 간 동등성확보를 통한 임상자료분석 (Clinical data analysis in retrospective study through equality adjustment between groups)

  • 곽상규;신임희
    • Journal of the Korean Data and Information Science Society
    • /
    • 제26권6호
    • /
    • pp.1317-1325
    • /
    • 2015
  • 두 집단간을 비교하는 다양한 임상연구에서 수집된 데이터를 분석할 때 질환에 미치는 영향을 알아보는 방법으로는 현 시점에서 어떤 특정 질환을 일으킬 수 있는 위험요인포함 우, 무에 띠라 연구대상자를 나누어, 추적 관찰하는 전향적 임상연구에서의 분석과 현 시점에서 질환 유, 무에 따라 위험요인을 과거의 연구대상자의 관찰기록을 바탕으로 확인하는 후향적 임상연구에서의 분석이 있다. 접근 방법과 연구 설계는 다르지만 두 가지 연구의 목적은 두 집단 간 명확한 차이를 확인하는데 있으며, 나아가 두집단의 분류에 영향을 주는 변수가 무엇인지를 알아보는 방법이다. 특히 임상연구에서 두 집단을 비교할 때 성별과 나이와 같은 기본적 특성변수의 영향을 통제한 상태에서 임상적 변수들의 집단 간 차이와 영향을 살펴보아야 한다. 하지만 후향적 연구에서는 과거의 관찰기록을 바탕으로 분석이 진행되는 연구이므로 연구대상자를 두 집단으로 무작위 할당하지 못했기 때문에 기본적인 특성변수들이 차이가 나는 경우가 빈번하게 발생할 수 있다. 이를 해결하기 위하여 임상자료를 분석하는 방법으로 공변량을 사용한다. 대표적으로 공변량을 사용하는 분석방법으로는 공분산분석, 수정회귀모형, propensity score matching 방법 등이 있다. 본 연구는 후향적 임상연구에서 공변량을 이용한 자료 분석 방법 및 propensity score matching 방법을 소개하고, 실제 위암 환자들의 재발관련 자료에 적용하여, 그 필요성을 확인한다.

Application of Constrained Bayes Estimation under Balanced Loss Function in Insurance Pricing

  • Kim, Myung Joon;Kim, Yeong-Hwa
    • Communications for Statistical Applications and Methods
    • /
    • 제21권3호
    • /
    • pp.235-243
    • /
    • 2014
  • Constrained Bayesian estimates overcome the over shrinkness toward the mean which usual Bayes and empirical Bayes estimates produce by matching first and second empirical moments; subsequently, a constrained Bayes estimate is recommended to use in case the research objective is to produce a histogram of the estimates considering the location and dispersion. The well-known squared error loss function exclusively emphasizes the precision of estimation and may lead to biased estimators. Thus, the balanced loss function is suggested to reflect both goodness of fit and precision of estimation. In insurance pricing, the accurate location estimates of risk and also dispersion estimates of each risk group should be considered under proper loss function. In this paper, by applying these two ideas, the benefit of the constrained Bayes estimates and balanced loss function will be discussed; in addition, application effectiveness will be proved through an analysis of real insurance accident data.

Cluster Analysis with Balancing Weight on Mixed-type Data

  • Chae, Seong-San;Kim, Jong-Min;Yang, Wan-Youn
    • Communications for Statistical Applications and Methods
    • /
    • 제13권3호
    • /
    • pp.719-732
    • /
    • 2006
  • A set of clustering algorithms with proper weight on the formulation of distance which extend to mixed numeric and multiple binary values is presented. A simple matching and Jaccard coefficients are used to measure similarity between objects for multiple binary attributes. Similarities are converted to dissimilarities between i th and j th objects. The performance of clustering algorithms with balancing weight on different similarity measures is demonstrated. Our experiments show that clustering algorithms with application of proper weight give competitive recovery level when a set of data with mixed numeric and multiple binary attributes is clustered.

The Use of Generalized Gamma-Polynomial Approximation for Hazard Functions

  • Ha, Hyung-Tae
    • 응용통계연구
    • /
    • 제22권6호
    • /
    • pp.1345-1353
    • /
    • 2009
  • We introduce a simple methodology, so-called generalized gamma-polynomial approximation, based on moment-matching technique to approximate survival and hazard functions in the context of parametric survival analysis. We use the generalized gamma-polynomial approximation to approximate the density and distribution functions of convolutions and finite mixtures of random variables, from which the approximated survival and hazard functions are obtained. This technique provides very accurate approximation to the target functions, in addition to their being computationally efficient and easy to implement. In addition, the generalized gamma-polynomial approximations are very stable in middle range of the target distributions, whereas saddlepoint approximations are often unstable in a neighborhood of the mean.

A Study of Association Rule Mining by Clustering through Data Fusion

  • Cho, Kwang-Hyun;Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • 제18권4호
    • /
    • pp.927-935
    • /
    • 2007
  • Currently, Gyeongnam province is executing the social index survey every year to the provincials. But, this survey has the limit of the analysis as execution of the different survey per 3 year cycles. The solution of this problem is data fusion. Data fusion is the process of combining multiple data in order to provide information of tactical value to the user. But, data fusion doesn#t mean the ultimate result. Therefore, efficient analysis for the data fusion is also important. In this study, we present data fusion method of statistical survey data. Also, we suggest application methodology of association rule mining by clustering through data fusion of statistical survey data.

  • PDF

Change-point Estimation based on Log Scores

  • Kim, Jaehee;Seo, Hyunjoo
    • Communications for Statistical Applications and Methods
    • /
    • 제9권1호
    • /
    • pp.75-86
    • /
    • 2002
  • We consider the problem of estimating the change-point in mean change model with one change-point. Gombay and Huskova(1998) derived a class of change-point estimators with the score function of rank. A change-point estimator with the log score function of rank is suggested and is shown to be involved in the class of Gombay and Huskova(1988). The simulation results show that the proposed estimator has smaller rose, larger proportion of matching the true change-point than the other estimators considered in the experiment when the change-point occurs in the middle of the sample.

Two-stage imputation method to handle missing data for categorical response variable

  • Jong-Min Kim;Kee-Jae Lee;Seung-Joo Lee
    • Communications for Statistical Applications and Methods
    • /
    • 제30권6호
    • /
    • pp.577-587
    • /
    • 2023
  • Conventional categorical data imputation techniques, such as mode imputation, often encounter issues related to overestimation. If the variable has too many categories, multinomial logistic regression imputation method may be impossible due to computational limitations. To rectify these limitations, we propose a two-stage imputation method. During the first stage, we utilize the Boruta variable selection method on the complete dataset to identify significant variables for the target categorical variable. Then, in the second stage, we use the important variables for the target categorical variable for logistic regression to impute missing data in binary variables, polytomous regression to impute missing data in categorical variables, and predictive mean matching to impute missing data in quantitative variables. Through analysis of both asymmetric and non-normal simulated and real data, we demonstrate that the two-stage imputation method outperforms imputation methods lacking variable selection, as evidenced by accuracy measures. During the analysis of real survey data, we also demonstrate that our suggested two-stage imputation method surpasses the current imputation approach in terms of accuracy.

A NONPARAMETRIC CHANGE-POINT ESTIMATOR USING WINDOW IN MEAN CHANGE MODEL

  • Kim, Jae-Hee;Jang, Hee-Yoon
    • Journal of applied mathematics & informatics
    • /
    • 제7권2호
    • /
    • pp.653-664
    • /
    • 2000
  • The problem of inference about the unknown change-point with a change in mean is considered. We suggest a nonparametric change-point estimator using window and prove its consistency when the errors are from the distribution with the mean zero and the common variance. a comparison study is done by simulation on the mean, the variance, and the proportion of matching the true change-points.