• 제목/요약/키워드: Multivariate Statistical Analysis

검색결과 632건 처리시간 0.024초

A Short Note on Empirical Penalty Term Study of BIC in K-means Clustering Inverse Regression

  • Ahn, Ji-Hyun;Yoo, Jae-Keun
    • Communications for Statistical Applications and Methods
    • /
    • 제18권3호
    • /
    • pp.267-275
    • /
    • 2011
  • According to recent studies, Bayesian information criteria(BIC) is proposed to determine the structural dimension of the central subspace through sliced inverse regression(SIR) with high-dimensional predictors. The BIC may be useful in K-means clustering inverse regression(KIR) with high-dimensional predictors. However, the direct application of the BIC to KIR may be problematic, because the slicing scheme in SIR is not the same as that of KIR. In this paper, we present empirical penalty term studies of BIC in KIR to identify the most appropriate one. Numerical studies and real data analysis are presented.

Partitioning likelihood method in the analysis of non-monotone missing data

  • Kim Jae-Kwang
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2004년도 학술발표논문집
    • /
    • pp.1-8
    • /
    • 2004
  • We address the problem of parameter estimation in multivariate distributions under ignorable non-monotone missing data. The factoring likelihood method for monotone missing data, termed by Robin (1974), is extended to a more general case of non-monotone missing data. The proposed method is algebraically equivalent to the Newton-Raphson method for the observed likelihood, but avoids the burden of computing the first and the second partial derivatives of the observed likelihood Instead, the maximum likelihood estimates and their information matrices for each partition of the data set are computed separately and combined naturally using the generalized least squares method. A numerical example is also presented to illustrate the method.

  • PDF

Bayesian Analysis of Multivariate Threshold Animal Models Using Gibbs Sampling

  • Lee, Seung-Chun;Lee, Deukhwan
    • Journal of the Korean Statistical Society
    • /
    • 제31권2호
    • /
    • pp.177-198
    • /
    • 2002
  • The estimation of variance components or variance ratios in linear model is an important issue in plant or animal breeding fields, and various estimation methods have been devised to estimate variance components or variance ratios. However, many traits of economic importance in those fields are observed as dichotomous or polychotomous outcomes. The usual estimation methods might not be appropriate for these cases. Recently threshold linear model is considered as an important tool to analyze discrete traits specially in animal breeding field. In this note, we consider a hierarchical Bayesian method for the threshold animal model. Gibbs sampler for making full Bayesian inferences about random effects as well as fixed effects is described to analyze jointly discrete traits and continuous traits. Numerical example of the model with two discrete ordered categorical traits, calving ease of calves from born by heifer and calving ease of calf from born by cow, and one normally distributed trait, birth weight, is provided.

Variable Selection with Nonconcave Penalty Function on Reduced-Rank Regression

  • Jung, Sang Yong;Park, Chongsun
    • Communications for Statistical Applications and Methods
    • /
    • 제22권1호
    • /
    • pp.41-54
    • /
    • 2015
  • In this article, we propose nonconcave penalties on a reduced-rank regression model to select variables and estimate coefficients simultaneously. We apply HARD (hard thresholding) and SCAD (smoothly clipped absolute deviation) symmetric penalty functions with singularities at the origin, and bounded by a constant to reduce bias. In our simulation study and real data analysis, the new method is compared with an existing variable selection method using $L_1$ penalty that exhibits competitive performance in prediction and variable selection. Instead of using only one type of penalty function, we use two or three penalty functions simultaneously and take advantages of various types of penalty functions together to select relevant predictors and estimation to improve the overall performance of model fitting.

Global and Local Views of the Hilbert Space Associated to Gaussian Kernel

  • Huh, Myung-Hoe
    • Communications for Statistical Applications and Methods
    • /
    • 제21권4호
    • /
    • pp.317-325
    • /
    • 2014
  • Consider a nonlinear transform ${\Phi}(x)$ of x in $\mathbb{R}^p$ to Hilbert space H and assume that the dot product between ${\Phi}(x)$ and ${\Phi}(x^{\prime})$ in H is given by < ${\Phi}(x)$, ${\Phi}(x^{\prime})$ >= K(x, x'). The aim of this paper is to propose a mathematical technique to take screen shots of the multivariate dataset mapped to Hilbert space H, particularly suited to Gaussian kernel $K({\cdot},{\cdot})$, which is defined by $K(x,x^{\prime})={\exp}(-{\sigma}{\parallel}x-x^{\prime}{\parallel}^2)$, ${\sigma}$ > 0. Several numerical examples are given.

통계적 그래픽스 도구로서의 정다각기둥평행좌표그림 (Regular Polyprism Parallel Coordinate Plot as a Statistical Graphics Tool)

  • 장대흥
    • 응용통계연구
    • /
    • 제21권4호
    • /
    • pp.695-704
    • /
    • 2008
  • 평행좌표그림은 다변량자료를 시각화하는 하나의 방법이다. 평행좌표그림은 4차원 이상의 직각좌표계 표시의 어려움을 극복할 수 있는 그림이다. 그러나 변수 축의 배열에 따라 같은 자료에 대하여도 다른 해석이 가능하다. 변수 선택 문제를 해결하는 한 가지 방법으로서 우리는 정다각기둥평행좌표그림을 제안할 수 있다.

Graphical Representation of Partially Ranked Data

  • Han, Sang-Tae
    • Communications for Statistical Applications and Methods
    • /
    • 제18권5호
    • /
    • pp.637-644
    • /
    • 2011
  • Partially ranked data refers to the situation in which there are p distinct objects; however each judge specifies only first s (s < p) choices. The group theoretic formulation for partially ranked data analysis was set up by Critchlow (1985). We propose a graphical method for partially ranked data by quantifying objects and judges. In a plot for judges, the interpoint distances can be interpreted as Spearman or Kendall distances between two rankings given by respective judges. Similarly, we also construct a plot for objects with a sensible relationship to the previous plot for judges. This study extends the Han and Huh (1995) quantification method of fully ranked data using Gabriel's (1971) biplot technique for multivariate data matrix.

Use of Beta-Polynomial Approximations for Variance Homogeneity Test and a Mixture of Beta Variates

  • Ha, Hyung-Tae;Kim, Chung-Ah
    • Communications for Statistical Applications and Methods
    • /
    • 제16권2호
    • /
    • pp.389-396
    • /
    • 2009
  • Approximations for the null distribution of a test statistic arising in multivariate analysis to test homogeneity of variances and a mixture of two beta distributions by making use of a product of beta baseline density function and a polynomial adjustment, so called beta-polynomial density approximant, are discussed. Explicit representations of density and distribution approximants of interest in each case can easily be obtained. Beta-polynomial density approximants produce good approximation over the entire range of the test statistic and also accommodate even the bimodal distribution using an artificial example of a mixture of two beta distributions.

기업도산예측을 위한 통계적모형과 인공지능 모형간의 예측력 비교에 관한 연구 : MDA,귀납적 학습방법, 인공신경망 (A Comparative Study on the Bankruptcy Prediction Power of Statistical Model and AI Models: MDA, Inductive,Neural Network)

  • 이건창
    • 한국경영과학회지
    • /
    • 제18권2호
    • /
    • pp.57-81
    • /
    • 1993
  • This paper is concerned with analyzing the bankruptcy prediction power of three methods : Multivariate Discriminant Analysis (MDA), Inductive Learning, Neural Network, MDA has been famous for its effectiveness for predicting bankrupcy in accounting fields. However, it requires rigorous statistical assumptions, so that violating one of the assumptions may result in biased outputs. In this respect, we alternatively propose the use of two AI models for bankrupcy prediction-inductive learning and neural network. To compare the performance of those two AI models with that of MDA, we have performed massive experiments with a number of Korean bankrupt-cases. Experimental results show that AI models proposed in this study can yield more robust and generalizing bankrupcy prediction than the conventional MDA can do.

  • PDF

Multiple Testing in Genomic Sequences Using Hamming Distance

  • Kang, Moonsu
    • Communications for Statistical Applications and Methods
    • /
    • 제19권6호
    • /
    • pp.899-904
    • /
    • 2012
  • High-dimensional categorical data models with small sample sizes have not been used extensively in genomic sequences that involve count (or discrete) or purely qualitative responses. A basic task is to identify differentially expressed genes (or positions) among a number of genes. It requires an appropriate test statistics and a corresponding multiple testing procedure so that a multivariate analysis of variance should not be feasible. A family wise error rate(FWER) is not appropriate to test thousands of genes simultaneously in a multiple testing procedure. False discovery rate(FDR) is better than FWER in multiple testing problems. The data from the 2002-2003 SARS epidemic shows that a conventional FDR procedure and a proposed test statistic based on a pseudo-marginal approach with Hamming distance performs better.