• Title/Summary/Keyword: Statistics Matching

Search Result 185, Processing Time 0.032 seconds

Theoretical Peptide Mass Distribution in the Non-Redundant Protein Database of the NCBI

  • Lim Da-Jeong;Oh Hee-Seok;Kim Hee-Bal
    • Genomics & Informatics
    • /
    • v.4 no.2
    • /
    • pp.65-70
    • /
    • 2006
  • Peptide mass mapping is the matching of experimentally generated peptides masses with the predicted masses of digested proteins contained in a database. To identify proteins by matching their constituent fragment masses to the theoretical peptide masses generated from a protein database, the peptide mass fingerprinting technique is used for the protein identification. Thus, it is important to know the theoretical mass distribution of the database. However, few researches have reported the peptide mass distribution of a database. We analyzed the peptide mass distribution of non-redundant protein sequence database in the NCBI after digestion with 15 different types of enzymes. In order to characterize the peptide mass distribution with different digestion enzymes, a power law distribution (Zipfs law) was applied to the distribution. After constructing simulated digestion of a protein database, rank-frequency plot of peptide fragments was applied to generalize a Zipfs law curve for all enzymes. As a result, our data appear to fit Zipfs law with statistically significant parameter values.

Method of Measuring Color Difference Between Images using Corresponding Points and Histograms (대응점 및 히스토그램을 이용한 영상 간의 컬러 차이 측정 기법)

  • Hwang, Young-Bae;Kim, Je-Woo;Choi, Byeong-Ho
    • Journal of Broadcast Engineering
    • /
    • v.17 no.2
    • /
    • pp.305-315
    • /
    • 2012
  • Color correction between two or multiple images is very crucial for the development of subsequent algorithms and stereoscopic 3D camera system. Even though various color correction methods are proposed recently, there are few methods for measuring the performance of these methods. In addition, when two images have view variation by camera positions, previous methods for the performance measurement may not be appropriate. In this paper, we propose a method of measuring color difference between corresponding images for color correction. This method finds matching points that have the same colors between two scenes to consider the view variation by correspondence searches. Then, we calculate statistics from neighbor regions of these matching points to measure color difference. From this approach, we can consider misalignment of corresponding points contrary to conventional geometric transformation by a single homography. To handle the case that matching points cannot cover the whole regions, we calculate statistics of color difference from the whole image regions. Finally, the color difference is computed by the weighted summation between correspondence based and the whole region based approaches. This weight is determined by calculating the ratio of occupying regions by correspondence based color comparison.

Clinical data analysis in retrospective study through equality adjustment between groups (후향적연구의 집단 간 동등성확보를 통한 임상자료분석)

  • Kwak, Sang Gyu;Shin, Im Hee
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.6
    • /
    • pp.1317-1325
    • /
    • 2015
  • There are two types of clinical research to figure out risk factor for disease using collected data. One is prospective study to approach the subjects from the present time and the other is retrospective study to find the risk factor using the subject's information in the past. Both approached and study design are different but the purpose of the two studies is to identify a significant difference between two groups and to find out what the variables to influence groups. Especially when comparing the two groups in clinical research, we have to look at the difference between the impact clinical variables by group while controlling the influence of the baseline characteristics variables such as age and sex. However, in the retrospective study, the difference of baseline characteristic variables can occur more frequently because the past records did not randomly assign subjects into two groups. In clinical data analysis use covariates to solve this problem. Typically, the analysis method using the analysis of covariance of variance, adjusted model, and propensity score matching method. This study is introduce the way of equality adjustment between groups data analysis using covariates in retrospective clinical studies and apply it to the recurrence of gastric cancer data.

Application of Constrained Bayes Estimation under Balanced Loss Function in Insurance Pricing

  • Kim, Myung Joon;Kim, Yeong-Hwa
    • Communications for Statistical Applications and Methods
    • /
    • v.21 no.3
    • /
    • pp.235-243
    • /
    • 2014
  • Constrained Bayesian estimates overcome the over shrinkness toward the mean which usual Bayes and empirical Bayes estimates produce by matching first and second empirical moments; subsequently, a constrained Bayes estimate is recommended to use in case the research objective is to produce a histogram of the estimates considering the location and dispersion. The well-known squared error loss function exclusively emphasizes the precision of estimation and may lead to biased estimators. Thus, the balanced loss function is suggested to reflect both goodness of fit and precision of estimation. In insurance pricing, the accurate location estimates of risk and also dispersion estimates of each risk group should be considered under proper loss function. In this paper, by applying these two ideas, the benefit of the constrained Bayes estimates and balanced loss function will be discussed; in addition, application effectiveness will be proved through an analysis of real insurance accident data.

Cluster Analysis with Balancing Weight on Mixed-type Data

  • Chae, Seong-San;Kim, Jong-Min;Yang, Wan-Youn
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.3
    • /
    • pp.719-732
    • /
    • 2006
  • A set of clustering algorithms with proper weight on the formulation of distance which extend to mixed numeric and multiple binary values is presented. A simple matching and Jaccard coefficients are used to measure similarity between objects for multiple binary attributes. Similarities are converted to dissimilarities between i th and j th objects. The performance of clustering algorithms with balancing weight on different similarity measures is demonstrated. Our experiments show that clustering algorithms with application of proper weight give competitive recovery level when a set of data with mixed numeric and multiple binary attributes is clustered.

The Use of Generalized Gamma-Polynomial Approximation for Hazard Functions

  • Ha, Hyung-Tae
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.6
    • /
    • pp.1345-1353
    • /
    • 2009
  • We introduce a simple methodology, so-called generalized gamma-polynomial approximation, based on moment-matching technique to approximate survival and hazard functions in the context of parametric survival analysis. We use the generalized gamma-polynomial approximation to approximate the density and distribution functions of convolutions and finite mixtures of random variables, from which the approximated survival and hazard functions are obtained. This technique provides very accurate approximation to the target functions, in addition to their being computationally efficient and easy to implement. In addition, the generalized gamma-polynomial approximations are very stable in middle range of the target distributions, whereas saddlepoint approximations are often unstable in a neighborhood of the mean.

A Study of Association Rule Mining by Clustering through Data Fusion

  • Cho, Kwang-Hyun;Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.4
    • /
    • pp.927-935
    • /
    • 2007
  • Currently, Gyeongnam province is executing the social index survey every year to the provincials. But, this survey has the limit of the analysis as execution of the different survey per 3 year cycles. The solution of this problem is data fusion. Data fusion is the process of combining multiple data in order to provide information of tactical value to the user. But, data fusion doesn#t mean the ultimate result. Therefore, efficient analysis for the data fusion is also important. In this study, we present data fusion method of statistical survey data. Also, we suggest application methodology of association rule mining by clustering through data fusion of statistical survey data.

  • PDF

Change-point Estimation based on Log Scores

  • Kim, Jaehee;Seo, Hyunjoo
    • Communications for Statistical Applications and Methods
    • /
    • v.9 no.1
    • /
    • pp.75-86
    • /
    • 2002
  • We consider the problem of estimating the change-point in mean change model with one change-point. Gombay and Huskova(1998) derived a class of change-point estimators with the score function of rank. A change-point estimator with the log score function of rank is suggested and is shown to be involved in the class of Gombay and Huskova(1988). The simulation results show that the proposed estimator has smaller rose, larger proportion of matching the true change-point than the other estimators considered in the experiment when the change-point occurs in the middle of the sample.

Two-stage imputation method to handle missing data for categorical response variable

  • Jong-Min Kim;Kee-Jae Lee;Seung-Joo Lee
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.6
    • /
    • pp.577-587
    • /
    • 2023
  • Conventional categorical data imputation techniques, such as mode imputation, often encounter issues related to overestimation. If the variable has too many categories, multinomial logistic regression imputation method may be impossible due to computational limitations. To rectify these limitations, we propose a two-stage imputation method. During the first stage, we utilize the Boruta variable selection method on the complete dataset to identify significant variables for the target categorical variable. Then, in the second stage, we use the important variables for the target categorical variable for logistic regression to impute missing data in binary variables, polytomous regression to impute missing data in categorical variables, and predictive mean matching to impute missing data in quantitative variables. Through analysis of both asymmetric and non-normal simulated and real data, we demonstrate that the two-stage imputation method outperforms imputation methods lacking variable selection, as evidenced by accuracy measures. During the analysis of real survey data, we also demonstrate that our suggested two-stage imputation method surpasses the current imputation approach in terms of accuracy.

A NONPARAMETRIC CHANGE-POINT ESTIMATOR USING WINDOW IN MEAN CHANGE MODEL

  • Kim, Jae-Hee;Jang, Hee-Yoon
    • Journal of applied mathematics & informatics
    • /
    • v.7 no.2
    • /
    • pp.653-664
    • /
    • 2000
  • The problem of inference about the unknown change-point with a change in mean is considered. We suggest a nonparametric change-point estimator using window and prove its consistency when the errors are from the distribution with the mean zero and the common variance. a comparison study is done by simulation on the mean, the variance, and the proportion of matching the true change-points.