• Title/Summary/Keyword: covariance methods

Search Result 452, Processing Time 0.026 seconds

A Study on the Poorly-posed Problems in the Discriminant Analysis of Growth Curve Model

  • Shim, Kyu-Bark
    • Communications for Statistical Applications and Methods
    • /
    • v.9 no.1
    • /
    • pp.87-100
    • /
    • 2002
  • Poorly-posed problems in the balanced discriminant analysis was considered. We restrict consideration to the case of observations and the number of variables are the same and small. When these problems exist, we do not use a maximum likelihood estimates(MLE) to estimate covariance matrices. Instead of MLE, an alternative estimate for the covariance matrices are proposed. This alternative method make good use of two regularization parameters, $\lambda$} and $\gamma$. A new test rule for the discriminant function is suggested and examined via limited hut informative simulation study. From the simulation study, it is shown that the suggested test rule gives better test result than other previously suggested method in terms of error rate criterion.

Autoregressive Cholesky Factor Modeling for Marginalized Random Effects Models

  • Lee, Keunbaik;Sung, Sunah
    • Communications for Statistical Applications and Methods
    • /
    • v.21 no.2
    • /
    • pp.169-181
    • /
    • 2014
  • Marginalized random effects models (MREM) are commonly used to analyze longitudinal categorical data when the population-averaged effects is of interest. In these models, random effects are used to explain both subject and time variations. The estimation of the random effects covariance matrix is not simple in MREM because of the high dimension and the positive definiteness. A relatively simple structure for the correlation is assumed such as a homogeneous AR(1) structure; however, it is too strong of an assumption. In consequence, the estimates of the fixed effects can be biased. To avoid this problem, we introduce one approach to explain a heterogenous random effects covariance matrix using a modified Cholesky decomposition. The approach results in parameters that can be easily modeled without concern that the resulting estimator will not be positive definite. The interpretation of the parameters is sensible. We analyze metabolic syndrome data from a Korean Genomic Epidemiology Study using this method.

Statistical Methods for Repeated Measures Data with Three Repeat Factors (반복요인이 3개인 반복측정자료에 대한 통계적 분석방법 -양평 주민 혈압자료를 이용하여-)

  • 강성현;박태성;이성곤;김창훈;김명희;최보율
    • The Korean Journal of Applied Statistics
    • /
    • v.17 no.1
    • /
    • pp.1-12
    • /
    • 2004
  • In this paper, we consider choosing the appropriate covariance structure for analyzing repeated measures data with three repeat factors from a study of blood pressure data, which is collected from the local residents of Yangpyeong, Gyeonggi-do (2001) and fitted linear mixed models to find the significant covariates on outcome variable(Blood Pressure)

Geodesic Clustering for Covariance Matrices

  • Lee, Haesung;Ahn, Hyun-Jung;Kim, Kwang-Rae;Kim, Peter T.;Koo, Ja-Yong
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.4
    • /
    • pp.321-331
    • /
    • 2015
  • The K-means clustering algorithm is a popular and widely used method for clustering. For covariance matrices, we consider a geodesic clustering algorithm based on the K-means clustering framework in consideration of symmetric positive definite matrices as a Riemannian (non-Euclidean) manifold. This paper considers a geodesic clustering algorithm for data consisting of symmetric positive definite (SPD) matrices, utilizing the Riemannian geometric structure for SPD matrices and the idea of a K-means clustering algorithm. A K-means clustering algorithm is divided into two main steps for which we need a dissimilarity measure between two matrix data points and a way of computing centroids for observations in clusters. In order to use the Riemannian structure, we adopt the geodesic distance and the intrinsic mean for symmetric positive definite matrices. We demonstrate our proposed method through simulations as well as application to real financial data.

Variable Selection Theorem for the Analysis of Covariance Model (공분산분석 모형에서의 변수선택 정리)

  • Yoon, Sang-Hoo;Park, Jeong-Soo
    • Communications for Statistical Applications and Methods
    • /
    • v.15 no.3
    • /
    • pp.333-342
    • /
    • 2008
  • Variable selection theorem in the linear regression model is extended to the analysis of covariance model. When some of regression variables are omitted from the model, it reduces the variance of the estimators but introduces bias. Thus an appropriate balance between a biased model and one with large variances is recommended.

Off-grid direction-of-arrival estimation for wideband noncircular sources

  • Xiaoyu Zhang;Haihong Tao;Ziye, Fang;Jian Xie
    • ETRI Journal
    • /
    • v.45 no.3
    • /
    • pp.492-504
    • /
    • 2023
  • Researchers have recently shown an increased interest in estimating the direction-of-arrival (DOA) of wideband noncircular sources, but existing studies have been restricted to subspace-based methods. An off-grid sparse recovery-based algorithm is proposed in this paper to improve the accuracy of existing algorithms in low signal-to-noise ratio situations. The covariance and pseudo covariance matrices can be jointly represented subject to block sparsity constraints by taking advantage of the joint sparsity between signal components and bias. Furthermore, the estimation problem is transformed into a single measurement vector problem utilizing the focused operation, resulting in a significant reduction in computational complexity. The proposed algorithm's error threshold and the Cramer-Rao bound for wideband noncircular DOA estimation are deduced in detail. The proposed algorithm's effectiveness and feasibility are demonstrated by simulation results.

Sample size calculations for clustered count data based on zero-inflated discrete Weibull regression models

  • Hanna Yoo
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.1
    • /
    • pp.55-64
    • /
    • 2024
  • In this study, we consider the sample size determination problem for clustered count data with many zeros. In general, zero-inflated Poisson and binomial models are commonly used for zero-inflated data; however, in real data the assumptions that should be satisfied when using each model might be violated. We calculate the required sample size based on a discrete Weibull regression model that can handle both underdispersed and overdispersed data types. We use the Monte Carlo simulation to compute the required sample size. With our proposed method, a unified model with a low failure risk can be used to cope with the dispersed data type and handle data with many zeros, which appear in groups or clusters sharing a common variation source. A simulation study shows that our proposed method provides accurate results, revealing that the sample size is affected by the distribution skewness, covariance structure of covariates, and amount of zeros. We apply our method to the pancreas disorder length of the stay data collected from Western Australia.

Multivariate Procedure for Variable Selection and Classification of High Dimensional Heterogeneous Data

  • Mehmood, Tahir;Rasheed, Zahid
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.6
    • /
    • pp.575-587
    • /
    • 2015
  • The development in data collection techniques results in high dimensional data sets, where discrimination is an important and commonly encountered problem that are crucial to resolve when high dimensional data is heterogeneous (non-common variance covariance structure for classes). An example of this is to classify microbial habitat preferences based on codon/bi-codon usage. Habitat preference is important to study for evolutionary genetic relationships and may help industry produce specific enzymes. Most classification procedures assume homogeneity (common variance covariance structure for all classes), which is not guaranteed in most high dimensional data sets. We have introduced regularized elimination in partial least square coupled with QDA (rePLS-QDA) for the parsimonious variable selection and classification of high dimensional heterogeneous data sets based on recently introduced regularized elimination for variable selection in partial least square (rePLS) and heterogeneous classification procedure quadratic discriminant analysis (QDA). A comparison of proposed and existing methods is conducted over the simulated data set; in addition, the proposed procedure is implemented to classify microbial habitat preferences by their codon/bi-codon usage. Five bacterial habitats (Aquatic, Host Associated, Multiple, Specialized and Terrestrial) are modeled. The classification accuracy of each habitat is satisfactory and ranges from 89.1% to 100% on test data. Interesting codon/bi-codons usage, their mutual interactions influential for respective habitat preference are identified. The proposed method also produced results that concurred with known biological characteristics that will help researchers better understand divergence of species.

Model selection algorithm in Gaussian process regression for computer experiments

  • Lee, Youngsaeng;Park, Jeong-Soo
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.4
    • /
    • pp.383-396
    • /
    • 2017
  • The model in our approach assumes that computer responses are a realization of a Gaussian processes superimposed on a regression model called a Gaussian process regression model (GPRM). Selecting a subset of variables or building a good reduced model in classical regression is an important process to identify variables influential to responses and for further analysis such as prediction or classification. One reason to select some variables in the prediction aspect is to prevent the over-fitting or under-fitting to data. The same reasoning and approach can be applicable to GPRM. However, only a few works on the variable selection in GPRM were done. In this paper, we propose a new algorithm to build a good prediction model among some GPRMs. It is a post-work of the algorithm that includes the Welch method suggested by previous researchers. The proposed algorithms select some non-zero regression coefficients (${\beta}^{\prime}s$) using forward and backward methods along with the Lasso guided approach. During this process, the fixed were covariance parameters (${\theta}^{\prime}s$) that were pre-selected by the Welch algorithm. We illustrated the superiority of our proposed models over the Welch method and non-selection models using four test functions and one real data example. Future extensions are also discussed.

Inverse Model Parameter Estimation Based on Sensitivity Analysis for Improvement of PM10 Forecasting (PM10 예보 향상을 위한 민감도 분석에 의한 역모델 파라메터 추정)

  • Yu, Suk Hyun;Koo, Youn Seo;Kwon, Hee Yong
    • Journal of Korea Multimedia Society
    • /
    • v.18 no.7
    • /
    • pp.886-894
    • /
    • 2015
  • In this paper, we conduct sensitivity analysis of parameters used for inverse modeling in order to estimate the PM10 emissions from the 16 areas in East Asia accurately. Parameters used in sensitivity analysis are R, the observational error covariance matrix, and B, a priori (background) error covariance matrix. In previous studies, it was used with the predetermined parameter empirically. Such a method, however, has difficulties in estimating an accurate emissions. Therefore, an automatically determining method for the most suitable value of R and B with an error measurement criteria and posteriori emissions accuracy is required. We determined the parameters through a sensitivity analysis, and improved the accuracy of posteriori emissions estimation. Inverse modeling methods used in the emissions estimation are pseudo inverse, NNLS (Nonnegative Least Square), and BA(Bayesian Approach). Pseudo inverse has a small error, but has negative values of emissions. In order to resolve the problem, NNLS is used. It has a unrealistic emissions, too. The problems are resolved with BA(Bayesian Approach). We showed the effectiveness and the accuracy of three methods through case studies.