• Title/Summary/Keyword: multivariate data

Search Result 1,971, Processing Time 0.028 seconds

A Jarque-Bera type test for multivariate normality based on second-power skewness and kurtosis

  • Kim, Namhyun
    • Communications for Statistical Applications and Methods
    • /
    • v.28 no.5
    • /
    • pp.463-475
    • /
    • 2021
  • Desgagné and de Micheaux (2018) proposed an alternative univariate normality test to the Jarque-Bera test. The proposed statistic is based on the sample second power skewness and kurtosis while the Jarque-Bera statistic uses sample Pearson's skewness and kurtosis that are the third and fourth standardized sample moments, respectively. In this paper, we generalize their statistic to a multivariate version based on orthogonalization or an empirical standardization of data. The proposed multivariate statistic follows chi-squared distribution approximately. A simulation study shows that the proposed statistic has good control of type I error even for a very small sample size when critical values from the approximate distribution are used. It has comparable power to the multivariate version of the Jarque-Bera test with exactly the same idea of the orthogonalization. It also shows much better power for some mixed normal alternatives.

AUTOMATED ELECTROFACIES DETERMINATION USING MULTIVARIATE STATISTICAL ANALYSIS

  • Kim Jungwhan;Lim Jong-Se
    • 한국석유지질학회:학술대회논문집
    • /
    • spring
    • /
    • pp.10-14
    • /
    • 1998
  • A systematic methodology is developed for the electrofacies determination from wireline log data using multivariate statistical analysis. To consider corresponding contribution of each log and reduce the computational dimension, multivariate logs are transformed into a single variable through principal components analysis. Resultant principal components logs are segmented using the statistical zonation method to enhance the efficiency and quality of the interpreted results. Hierarchical cluster analysis is then used to group the segments into electrofacies. Optimal number of groups is determined on the basis of the ratio of within-group variance to total variance and core data. This technique is applied to the wells in the Korea Continental Shelf. The results of field application demonstrate that the prediction of lithology based on the electrofacies classification matches well to the core and the cutting data with high reliability This methodology for electrofacies classification can be used to define the reservoir characteristics which are helpful to the reservoir management.

  • PDF

Random Effects Models for Multivariate Survival Data: Hierarchical-Likelihood Approach

  • Ha Il Do;Lee Youngjo;Song Jae-Kee
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2000.11a
    • /
    • pp.193-200
    • /
    • 2000
  • Modelling the dependence via random effects in censored multivariate survival data has recently received considerable attention in the biomedical literature. The random effects models model not only the conditional survival times but also the conditional hazard rate. Systematic likelihood inference for the models with random effects is possible using Lee and Nelder's (1996) hierarchical-likelihood (h-likelihood). The purpose of this presentation is to introduce Ha et al.'s (2000a,b) inferential methods for the random effects models via the h-likelihood, which provide a conceptually simple, numerically efficient and reliable inferential procedures.

  • PDF

Analysis of Multivariate System Using Mahalanobis Taguchi System (Mahalanobis Taguchi System을 이용한 다변량 시스템의 해석에 관한 연구)

  • Hong, Jung-Eui;Kwon, Hong-Kyu
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.32 no.1
    • /
    • pp.20-25
    • /
    • 2009
  • Mahalanobis Taguchi System (MTS) is a pattern information technology, which has been used in different diagnostic applications to make quantitative decisions by constructing a multivariate measurement scale using data analytic methods without any assumption regarding statistical distribution. The MTS performs Taguchi's fractional factorial design based on the Mahahlanobis Distance (MS) as a performance metric. In this work, MTS is used for analyzing Wisconsin Breast Cancer data which has ten attributes. Ten different tests are conducted for the data to determine if the patient has cancer or not. Also, MTS is used for reducing the number of test to define the relationship between each attribute and diagnosis result. The accuracy of diagnosis is compare with two different previous research.

Local T2 Control Charts for Process Control in Local Structure and Abnormal Distribution Data (지역적이고 비정규분포를 갖는 데이터의 공정관리를 위한 지역기반 T2관리도)

  • Kim, Jeong-Hun;Kim, Seoung-Bum
    • Journal of Korean Society for Quality Management
    • /
    • v.40 no.3
    • /
    • pp.337-346
    • /
    • 2012
  • Purpose: A Control chart is one of the important statistical process control tools that can improve processes by reducing variability and defects. Methods: In the present study, we propose the local $T^2$ multivariate control chart that can efficiently detect abnormal observations by considering the local pattern of the in-control observations. Results: A simulation study has been conducted to examine the property of the proposed control chart and compare it with existing multivariate control charts. Conclusion: The results demonstrate the usefulness and effectiveness of the proposed control chart.

Projection Pursuit K-Means Visual Clustering

  • Kim, Mi-Kyung;Huh, Myung-Hoe
    • Journal of the Korean Statistical Society
    • /
    • v.31 no.4
    • /
    • pp.519-532
    • /
    • 2002
  • K-means clustering is a well-known partitioning method of multivariate observations. Recently, the method is implemented broadly in data mining softwares due to its computational efficiency in handling large data sets. However, it does not yield a suitable visual display of multivariate observations that is important especially in exploratory stage of data analysis. The aim of this study is to develop a K-means clustering method that enables visual display of multivariate observations in a low-dimensional space, for which the projection pursuit method is adopted. We propose a computationally inexpensive and reliable algorithm and provide two numerical examples.

A Comparison of Methods for the Detection of Outliers in Multivariate Data

  • Hadi, Ali-S.;Joo, Hye-Seon;Son, Mun-S.
    • Communications for Statistical Applications and Methods
    • /
    • v.3 no.2
    • /
    • pp.53-67
    • /
    • 1996
  • Numerous classical as well as robust methods have been proposed in the literature for the detection of multiple outlier in multivariate data. The effectiveness and power of each of these methods have not been thoroughly investigated. In this paper we first reduce the vast number of outlier detection methods to a small number of viable ones. This reduction is based on previous work of other researches and on some theoretical arguments. Then we design and implement a Monte Carlo experiment for comparing these methods. The main goal of our study is to determine which methods are most powerful in the detection of multiple outlier and in dealing with the masking and swamping problems. The results of the Monte Carlo study indicate that two of the methods seem to hace better performances than the others for the detection of multiple outlier in multivariate data.

  • PDF

Depth-Based rank test for multivariate two-sample scale problem

  • Digambar Tukaram Shirke;Swapnil Dattatray Khorate
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.3
    • /
    • pp.227-244
    • /
    • 2023
  • In this paper, a depth-based nonparametric test for a multivariate two-sample scale problem is proposed. The proposed test statistic is based on the depth-induced ranks and is thus distribution-free. In this article, the depth values of data points of one sample are calculated with respect to the other sample or distribution and vice versa. A comprehensive simulation study is used to examine the performance of the proposed test for symmetric as well as skewed distributions. Comparison of the proposed test with the existing depth-based nonparametric tests is accomplished through empirical powers over different depth functions. The simulation study admits that the proposed test outperforms existing nonparametric depth-based tests for symmetric and skewed distributions. Finally, an actual life data set is used to demonstrate the applicability of the proposed test.

Application of functional ANOVA and functional MANOVA (단변량 및 다변량 함수 데이터에 대한 분산분석의 활용)

  • Kim, Mijeong
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.5
    • /
    • pp.579-591
    • /
    • 2022
  • Functional data is collected in various fields. It is often necessary to test whether there are differences among groups of functional data. In this case, it is not appropriate to explain using the point-wise ANOVA method, and we should present not the point-wise result but the integrated result. Various studies on functional data analysis of variance have been proposed, and recently implemented those methods in the package fdANOVA of R. In this paper, I first explain ANOVA and multivariate ANOVA, then I will introduce various methods of analysis of variance for univariate and multivariate functional data recently proposed. I also describe how to use the R package fdANOVA. This package is used to test equality of weekly temperatures in Seoul and Busan through univariate functional data ANOVA, and to test equality of multivariate functional data corresponding to handwritten images using multivariate function data ANOVA.

KCYP data analysis using Bayesian multivariate linear model (베이지안 다변량 선형 모형을 이용한 청소년 패널 데이터 분석)

  • Insun, Lee;Keunbaik, Lee
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.6
    • /
    • pp.703-724
    • /
    • 2022
  • Although longitudinal studies mainly produce multivariate longitudinal data, most of existing statistical models analyze univariate longitudinal data and there is a limitation to explain complex correlations properly. Therefore, this paper describes various methods of modeling the covariance matrix to explain the complex correlations. Among them, modified Cholesky decomposition, modified Cholesky block decomposition, and hypersphere decomposition are reviewed. In this paper, we review these methods and analyze Korean children and youth panel (KCYP) data are analyzed using the Bayesian method. The KCYP data are multivariate longitudinal data that have response variables: School adaptation, academic achievement, and dependence on mobile phones. Assuming that the correlation structure and the innovation standard deviation structure are different, several models are compared. For the most suitable model, all explanatory variables are significant for school adaptation, and academic achievement and only household income appears as insignificant variables when cell phone dependence is a response variable.