• 제목/요약/키워드: various multivariate statistical methods

검색결과 82건 처리시간 0.031초

Principles of Multivariate Data Visualization

  • Huh, Moon Yul;Cha, Woon Ock
    • Communications for Statistical Applications and Methods
    • /
    • 제11권3호
    • /
    • pp.465-474
    • /
    • 2004
  • Data visualization is the automation process and the discovery process to data sets in an effort to discover underlying information from the data. It provides rich visual depictions of the data. It has distinct advantages over traditional data analysis techniques such as exploring the structure of large scale data set both in the sense of number of observations and the number of variables by allowing great interaction with the data and end-user. We discuss the principles of data visualization and evaluate the characteristics of various tools of visualization according to these principles.

Box-Cox변환을 이용한 다변량 공정능력 분석 (Analysis of Multivariate Process Capability Using Box-Cox Transformation)

  • 문혜진;정영배
    • 산업경영시스템학회지
    • /
    • 제42권2호
    • /
    • pp.18-27
    • /
    • 2019
  • The process control methods based on the statistical analysis apply the analysis method or mathematical model under the assumption that the process characteristic is normally distributed. However, the distribution of data collected by the automatic measurement system in real time is often not followed by normal distribution. As the statistical analysis tools, the process capability index (PCI) has been used a lot as a measure of process capability analysis in the production site. However, PCI has been usually used without checking the normality test for the process data. Even though the normality assumption is violated, if the analysis method under the assumption of the normal distribution is performed, this will be an incorrect result and take a wrong action. When the normality assumption is violated, we can transform the non-normal data into the normal data by using an appropriate normal transformation method. There are various methods of the normal transformation. In this paper, we consider the Box-Cox transformation among them. Hence, the purpose of the study is to expand the analysis method for the multivariate process capability index using Box-Cox transformation. This study proposes the multivariate process capability index to be able to use according to both methodologies whether data is normally distributed or not. Through the computational examples, we compare and discuss the multivariate process capability index between before and after Box-Cox transformation when the process data is not normally distributed.

A spatial heterogeneity mixed model with skew-elliptical distributions

  • Farzammehr, Mohadeseh Alsadat;McLachlan, Geoffrey J.
    • Communications for Statistical Applications and Methods
    • /
    • 제29권3호
    • /
    • pp.373-391
    • /
    • 2022
  • The distribution of observations in most econometric studies with spatial heterogeneity is skewed. Usually, a single transformation of the data is used to approximate normality and to model the transformed data with a normal assumption. This assumption is however not always appropriate due to the fact that panel data often exhibit non-normal characteristics. In this work, the normality assumption is relaxed in spatial mixed models, allowing for spatial heterogeneity. An inference procedure based on Bayesian mixed modeling is carried out with a multivariate skew-elliptical distribution, which includes the skew-t, skew-normal, student-t, and normal distributions as special cases. The methodology is illustrated through a simulation study and according to the empirical literature, we fit our models to non-life insurance consumption observed between 1998 and 2002 across a spatial panel of 103 Italian provinces in order to determine its determinants. Analyzing the posterior distribution of some parameters and comparing various model comparison criteria indicate the proposed model to be superior to conventional ones.

Value at Risk of portfolios using copulas

  • Byun, Kiwoong;Song, Seongjoo
    • Communications for Statistical Applications and Methods
    • /
    • 제28권1호
    • /
    • pp.59-79
    • /
    • 2021
  • Value at Risk (VaR) is one of the most common risk management tools in finance. Since a portfolio of several assets, rather than one asset portfolio, is advantageous in the risk diversification for investment, VaR for a portfolio of two or more assets is often used. In such cases, multivariate distributions of asset returns are considered to calculate VaR of the corresponding portfolio. Copulas are one way of generating a multivariate distribution by identifying the dependence structure of asset returns while allowing many different marginal distributions. However, they are used mainly for bivariate distributions and are not widely used in modeling joint distributions for many variables in finance. In this study, we would like to examine the performance of various copulas for high dimensional data and several different dependence structures. This paper compares copulas such as elliptical, vine, and hierarchical copulas in computing the VaR of portfolios to find appropriate copula functions in various dependence structures among asset return distributions. In the simulation studies under various dependence structures and real data analysis, the hierarchical Clayton copula shows the best performance in the VaR calculation using four assets. For marginal distributions of single asset returns, normal inverse Gaussian distribution was used to model asset return distributions, which are generally high-peaked and heavy-tailed.

Restricted maximum likelihood estimation of a censored random effects panel regression model

  • Lee, Minah;Lee, Seung-Chun
    • Communications for Statistical Applications and Methods
    • /
    • 제26권4호
    • /
    • pp.371-383
    • /
    • 2019
  • Panel data sets have been developed in various areas, and many recent studies have analyzed panel, or longitudinal data sets. Maximum likelihood (ML) may be the most common statistical method for analyzing panel data models; however, the inference based on the ML estimate will have an inflated Type I error because the ML method tends to give a downwardly biased estimate of variance components when the sample size is small. The under estimation could be severe when data is incomplete. This paper proposes the restricted maximum likelihood (REML) method for a random effects panel data model with a censored dependent variable. Note that the likelihood function of the model is complex in that it includes a multidimensional integral. Many authors proposed to use integral approximation methods for the computation of likelihood function; however, it is well known that integral approximation methods are inadequate for high dimensional integrals in practice. This paper introduces to use the moments of truncated multivariate normal random vector for the calculation of multidimensional integral. In addition, a proper asymptotic standard error of REML estimate is given.

근적외선분광분석기 및 에너지 분산형 X선 형광분석기를 이용한 청국장 원산지 판별 (Identification of the geographical origin of cheonggukjang by using fourier transform near-infrared spectroscopy and energy dispersive X-ray fluorescence spectrometry)

  • 강동진;문지영;이동길;이성훈
    • 한국식품과학회지
    • /
    • 제48권5호
    • /
    • pp.418-423
    • /
    • 2016
  • 근적외선분광분석기와 에너지 분산형 X선 형광분석기를 이용한 분석방법을 개발하여 각각 97.5, 98.0%의 높은 정확도의 판별식을 확립하였고, 시중 유통 시료를 분석하여 검증한 결과 각각 96.3, 95.0%의 판별 정확도를 확인하였다. 이상의 연구 결과를 통하여 근적외선분광분석기와 에너지 분산형 X선 형광분석기를 이용하여 청국장 원산지 판별이 가능함을 확인하였고 이는 유기성분 함량에 따른 근적외선 흡광도와 무기성분 함량에 따른 X선 형광에너지 강도가 국내산과 수입산 간에 차이가 있기 때문으로 사료된다.

베이지안 다변량 선형 모형을 이용한 청소년 패널 데이터 분석 (KCYP data analysis using Bayesian multivariate linear model)

  • 이인선;이근백
    • 응용통계연구
    • /
    • 제35권6호
    • /
    • pp.703-724
    • /
    • 2022
  • 다변량 경시적 자료 분석은 반복 측정된 자료에 존재하는 상관관계를 올바르게 추정하면서 자료를 분석해야 한다. 경시적 연구에서는 다변량 경시적 자료가 주로 생성되지만, 기존 통계적 모형은 대부분 단변량으로 분석되어 다변량 경시적 자료에 존재하는 복잡한 상관관계를 제대로 설명하지 못하게 된다. 따라서 본 논문에서는 복잡한 상관관계를 설명하기 위해 공분산 행렬을 모형화하는 다양한 방법에 대해 고찰한다. 그 중 수정된 콜레스키 분해, 수정된 콜레스키 블록분해와 초구분해를 살펴본다. 그리고 일반화 자기회귀모수 행렬이 가지는 희박성 문제를 해결하기 위해 베이지안 방법을 이용하여 청소년 패널 데이터를 분석한다. 청소년 패널 데이터는 다변량 경시적 자료이며, 반응 변수로는 학교 적응도, 학업 성취도, 휴대전화 의존도를 고려한다. 자기 상관 구조와 혁신 표준 편차 구조를 달리 가정하여 여러 모형을 비교한다. 가장 적합한 모형에 대해 학교 적응도와 학업 성취도에 대해 모든 설명 변수가 유의미하며, 휴대전화 의존도가 반응 변수일 때 사교육 시간을 제외한 모든 설명 변수가 유의미한 것으로 나타난다.

단변량 및 다변량 함수 데이터에 대한 분산분석의 활용 (Application of functional ANOVA and functional MANOVA)

  • 김미정
    • 응용통계연구
    • /
    • 제35권5호
    • /
    • pp.579-591
    • /
    • 2022
  • 함수 데이터는 다양한 분야에서 수집되고 있으며, 집단 간의 함수 데이터를 비교해야하는 경우가 종종 발생한다. 이럴 경우 점별 분산분석 방법을 이용하여 설명하기에는 무리가 있으며, 통합된 결과를 제시할 필요가 있다. 이에 대한 다양한 연구가 제안되었으며, 최근에 R 패키지 fdANOVA로 구현되었다. 이 논문에서 우선 분산분석 및 다변량 분산분석을 설명하고, 최근에 제안된 다양한 단변량 및 다변량 함수 데이터 분산분석을 설명하고자 한다. 또한 R 패키지 fdANOVA의 사용 방법을 설명하고, 이 패키지를 이용하여 서울과 부산 지역의 주별 기온을 단변량 함수 데이터 분산분석을 통해 비교하고, 손글씨 이미지를 다변량 함수 데이터로 변환하여 다변량 함수 데이터 분산분석을 이용하여 비교하고자 한다.

적외선 분광분석과 다변량 통계에 기반한 바이오디젤 품질분석 (Analysis of biodiesel quality based on infrared spectroscopy and multivariate statistics)

  • 김혜실;조현우;유준
    • 분석과학
    • /
    • 제25권4호
    • /
    • pp.214-222
    • /
    • 2012
  • ASTM (American Society for Testing and Materials) D6751-10은 바이오디젤의 품질 규격 뿐 아니라 분석방법 또한 제시하고 있다. 하지만 ASTM 표준에 따른 바이오디젤 및 포함된 여러 불순물의 품질 분석은 경제적, 시간적으로 부담이 크다. 본 연구는 적외선 분광분석법(infrared spectroscopy)과 다변량 통계분석법 중 하나인 PLS (partial least square method)를 이용하여 1회 측정만으로 바이오 디젤 및 불순물들의 농도를 분석하는 시스템을 개발하고자 하였다. 특히, 적외선을 이용한 분석에서 생기는 각 물질의 스펙트럼에 대한 산란 보정, 노이즈 감소 등을 위해 SNV, MSC, OSC, Savitzky-Golay 등의 4가지 전처리 방법의 성능을 비교하였다. 품질 분석에 필요한 바이오 디젤 검량 모델을 PLS로 모델링 결과, Savitzky-Golay 전처리를 하였을 때 정확도가 가장 우수함을 알았다.

Simultaneous determination and difference evaluation of 14 ginsenosides in Panax ginseng roots cultivated in different areas and ages by high-performance liquid chromatography coupled with triple quadrupole mass spectrometer in the multiple reaction-monitoring mode combined with multivariate statistical analysis

  • Xiu, Yang;Li, Xue;Sun, Xiuli;Xiao, Dan;Miao, Rui;Zhao, Huanxi;Liu, Shuying
    • Journal of Ginseng Research
    • /
    • 제43권4호
    • /
    • pp.508-516
    • /
    • 2019
  • Background: Ginsenosides are not only the principal bioactive components but also the important indexes to the quality assessment of Panax ginseng Meyer. Their contents in cultivated ginseng vary with the growth environment and age. The present study aimed at evaluating the significant difference between 36 cultivated ginseng of different cultivation areas and ages based on the simultaneously determined contents of 14 ginsenosides. Methods: A high-performance liquid chromatography (HPLC) coupled with triple quadrupole mass spectrometer (MS) method was developed and used in the multiple reaction-monitoring (MRM) mode (HPLC-MRM/MS) for the quantitative analysis of ginsenosides. Multivariate statistical analysis, such as principal component analysis and partial least squares-discriminant analysis, was applied to discriminate ginseng samples of various cultivation areas and ages and to discover the differentially accumulated ginsenoside markers. Results: The developed HPLC-MRM/MS method was validated to be precise, accurate, stable, sensitive, and repeatable for the simultaneous determination of 14 ginsenosides. It was found that the 3- and 5-yr-old ginseng samples were differentiated distinctly by all means of multivariate statistical analysis, whereas the 4-yr-old samples exhibited similarity to either 3- or 5-yr-old samples in the contents of ginsenosides. Among the 14 detected ginsenosides, Rg1, Rb1, Rb2, Rc, 20(S)-Rf, 20(S)-Rh1, and Rb3 were identified as potential markers for the differentiation of cultivation ages. In addition, the 5-yr-old samples were able to be classified in cultivation area based on the contents of ginsenosides, whereas the 3- and 4-yr-old samples showed little differences in cultivation area. Conclusion: This study demonstrated that the HPLC-MRM/MS method combined with multivariate statistical analysis provides deep insight into the accumulation characteristics of ginsenosides and could be used to differentiate ginseng that are cultivated in different areas and ages.