• Title/Summary/Keyword: 혼합다변량정규분포

Search Result 7, Processing Time 0.022 seconds

An approximate fitting for mixture of multivariate skew normal distribution via EM algorithm (EM 알고리즘에 의한 다변량 치우친 정규분포 혼합모형의 근사적 적합)

  • Kim, Seung-Gu
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.3
    • /
    • pp.513-523
    • /
    • 2016
  • Fitting a mixture of multivariate skew normal distribution (MSNMix) with multiple skewness parameter vectors via EM algorithm often requires a highly expensive computational cost to calculate the moments and probabilities of multivariate truncated normal distribution in E-step. Subsequently, it is common to fit an asymmetric data set with MSNMix with a simple skewness parameter vector since it allows us to compute them in E-step in an univariate manner that guarantees a cheap computational cost. However, the adaptation of a simple skewness parameter is unrealistic in many situations. This paper proposes an approximate estimation for the MSNMix with multiple skewness parameter vectors that also allows us to treat them in an univariate manner. We additionally provide some experiments to show its effectiveness.

Multivariate empirical distribution plot and goodness-of-fit test (다변량 경험분포그림과 적합도 검정)

  • Hong, Chong Sun;Park, Yongho;Park, Jun
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.4
    • /
    • pp.579-590
    • /
    • 2017
  • The multivariate empirical distribution function could be defined when its distribution function can be estimated. It is known that bivariate empirical distribution functions could be visualized by using Step plot and Quantile plot. In this paper, the multivariate empirical distribution plot is proposed to represent the multivariate empirical distribution function on the unit square. Based on many kinds of empirical distribution plots corresponding to various multivariate normal distributions and other specific distributions, it is found that the empirical distribution plot also depends sensitively on its distribution function and correlation coefficients. Hence, we could suggest five goodness-of-fit test statistics. These critical values are obtained by Monte Carlo simulation. We explore that these critical values are not much different from those in text books. Therefore, we may conclude that the proposed test statistics in this work would be used with known critical values with ease.

Estimation of Spatial Distribution Using the Gaussian Mixture Model with Multivariate Geoscience Data (다변량 지구과학 데이터와 가우시안 혼합 모델을 이용한 공간 분포 추정)

  • Kim, Ho-Rim;Yu, Soonyoung;Yun, Seong-Taek;Kim, Kyoung-Ho;Lee, Goon-Taek;Lee, Jeong-Ho;Heo, Chul-Ho;Ryu, Dong-Woo
    • Economic and Environmental Geology
    • /
    • v.55 no.4
    • /
    • pp.353-366
    • /
    • 2022
  • Spatial estimation of geoscience data (geo-data) is challenging due to spatial heterogeneity, data scarcity, and high dimensionality. A novel spatial estimation method is needed to consider the characteristics of geo-data. In this study, we proposed the application of Gaussian Mixture Model (GMM) among machine learning algorithms with multivariate data for robust spatial predictions. The performance of the proposed approach was tested through soil chemical concentration data from a former smelting area. The concentrations of As and Pb determined by ex-situ ICP-AES were the primary variables to be interpolated, while the other metal concentrations by ICP-AES and all data determined by in-situ portable X-ray fluorescence (PXRF) were used as auxiliary variables in GMM and ordinary cokriging (OCK). Among the multidimensional auxiliary variables, important variables were selected using a variable selection method based on the random forest. The results of GMM with important multivariate auxiliary data decreased the root mean-squared error (RMSE) down to 0.11 for As and 0.33 for Pb and increased the correlations (r) up to 0.31 for As and 0.46 for Pb compared to those from ordinary kriging and OCK using univariate or bivariate data. The use of GMM improved the performance of spatial interpretation of anthropogenic metals in soil. The multivariate spatial approach can be applied to understand complex and heterogeneous geological and geochemical features.

A mixed model for repeated split-plot data (반복측정의 분할구 자료에 대한 혼합모형)

  • Choi, Jae-Sung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.1
    • /
    • pp.1-9
    • /
    • 2010
  • This paper suggests a mixed-effects model for analyzing split-plot data when there is a repeated measures factor that affects on the response variable. Covariance structures are discussed among the observations because of the assumption of a repeated measures factor as one of explanatory variables. As a plausible covariance structure, compound symmetric covariance structure is assumed for analyzing data. The restricted maximum likelihood (REML)method is used for estimating fixed effects in the model.

혼합설계의 교호작용에 대한 여러 검정법들과 결사평균을 이용하여 변형한 검정법들의 강인성 비교

  • 김현철
    • Communications for Statistical Applications and Methods
    • /
    • v.5 no.3
    • /
    • pp.633-644
    • /
    • 1998
  • 혼합설계의 교호작용에 대한 F 검정이 유효하려면 다표본 구형성(multisample sphericity) 가정과 다변량 정규분포 가정이 만족되어야 한다. F 검정을 실시하기 위한 가정들이 위반된 조건하에서 혼합설계의 교호작용에 대한 검정법들의 1종오류가 비교되었다. 비교된 검정법들은 (1) F 검정(F), (2) 절사평균을 사용한 F 검정($F_T$)(3)$\varepsilon$-수정 F 검정($\varepsilon)$(4) 절사평균을 사용한 $\varepsilon$-수정 F 검정$(\varepsilon_T$) (5) CIGA검정(CIGA), (6) 절사평균을 사용한 CIGA검정($CIGA_T$)이었다. 결과는 CIGA와 $CIGA_T$는 1종오류를 대체로 잘 관리하나, F검정들과 ($\varepsilon$)검정들은 일부 조건에서 아주 작은 1종오류나 아주 큰 1종오류를 갖는 것으로 나타났다.

  • PDF

Double K-Means Clustering (이중 K-평균 군집화)

  • 허명회
    • The Korean Journal of Applied Statistics
    • /
    • v.13 no.2
    • /
    • pp.343-352
    • /
    • 2000
  • In this study. the author proposes a nonhierarchical clustering method. called the "Double K-Means Clustering", which performs clustering of multivariate observations with the following algorithm: Step I: Carry out the ordinary K-means clmitering and obtain k temporary clusters with sizes $n_1$,... , $n_k$, centroids $c_$1,..., $c_k$ and pooled covariance matrix S. $\bullet$ Step II-I: Allocate the observation x, to the cluster F if it satisfies ..... where N is the total number of observations, for -i = 1, . ,N. $\bullet$ Step II-2: Update cluster sizes $n_1$,... , $n_k$, centroids $c_$1,..., $c_k$ and pooled covariance matrix S. $\bullet$ Step II-3: Repeat Steps II-I and II-2 until the change becomes negligible. The double K-means clustering is nearly "optimal" under the mixture of k multivariate normal distributions with the common covariance matrix. Also, it is nearly affine invariant, with the data-analytic implication that variable standardizations are not that required. The method is numerically demonstrated on Fisher's iris data.

  • PDF

Methods for Genetic Parameter Estimations of Carcass Weight, Longissimus Muscle Area and Marbling Score in Korean Cattle (한우의 도체중, 배장근단면적 및 근내지방도의 유전모수 추정방법)

  • Lee, D.H.
    • Journal of Animal Science and Technology
    • /
    • v.46 no.4
    • /
    • pp.509-516
    • /
    • 2004
  • This study is to investigate the amount of biased estimates for heritability and genetic correlation according to data structure on marbling scores in Korean cattle. Breeding population with 5 generations were simulated by way of selection for carcass weight, Longissimus muscle area and latent values of marbling scores and random mating. Latent variables of marbling scores were categorized into five by the thresholds of 0, I, 2, and 3 SD(DSI) or seven by the thresholds of -2, -1, 0,1I, 2, and 3 SD(DS2). Variance components and genetic pararneters(Heritabilities and Genetic correlations) were estimated by restricted maximum likelihood on multivariate linear mixed animal models and by Gibbs sampling algorithms on multivariate threshold mixed animal models in DS1 and DS2. Simulation was performed for 10 replicates and averages and empirical standard deviation were calculated. Using REML, heritabilitis of marbling score were under-estimated as 0.315 and 0.462 on DS1 and DS2, respectively, with comparison of the pararneter(0.500). Otherwise, using Gibbs sampling in the multivariate threshold animal models, these estimates did not significantly differ to the parameter. Residual correlations of marbling score to other traits were reduced with comparing the parameters when using REML algorithm with assuming linear and normal distribution. This would be due to loss of information and therefore, reduced variation on marbling score. As concluding, genetic variation of marbling would be well defined if liability concepts were adopted on marbling score and implemented threshold mixed model on genetic parameter estimation in Korean cattle.