• Title/Summary/Keyword: Multivariate Statistical Analysis

Search Result 637, Processing Time 0.023 seconds

Selection of markers in the framework of multivariate receiver operating characteristic curve analysis in binary classification

  • Sameera, G;Vishnu, Vardhan R
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.2
    • /
    • pp.79-89
    • /
    • 2019
  • Classification models pertaining to receiver operating characteristic (ROC) curve analysis have been extended from univariate to multivariate setup by linearly combining available multiple markers. One such classification model is the multivariate ROC curve analysis. However, not all markers contribute in a real scenario and may mask the contribution of other markers in classifying the individuals/objects. This paper addresses this issue by developing an algorithm that helps in identifying the important markers that are significant and true contributors. The proposed variable selection framework is supported by real datasets and a simulation study, it is shown to provide insight about the individual marker's significance in providing a classifier rule/linear combination with good extent of classification.

Residuals Plots for Repeated Measures Data

  • PARK TAESUNG
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2000.11a
    • /
    • pp.187-191
    • /
    • 2000
  • In the analysis of repeated measurements, multivariate regression models that account for the correlations among the observations from the same subject are widely used. Like the usual univariate regression models, these multivariate regression models also need some model diagnostic procedures. In this paper, we propose a simple graphical method to detect outliers and to investigate the goodness of model fit in repeated measures data. The graphical method is based on the quantile-quantile(Q-Q) plots of the $X^2$ distribution and the standard normal distribution. We also propose diagnostic measures to detect influential observations. The proposed method is illustrated using two examples.

  • PDF

Statistical Outliers in Florida Counties at the Presidential Election 2000 (2000년 미국대선 플로리다주의 투표결과 분석)

  • 김현철
    • The Korean Journal of Applied Statistics
    • /
    • v.15 no.1
    • /
    • pp.21-32
    • /
    • 2002
  • We searched out in the votes data of the State of Florida at presidential election 2000. We used a multivariate regression analysis. We got there were several outliers including Palm Beach County. It means that we should analyze the number of disqualified ballots which were double-punched as well as the votes, to insist the " Butterfly Ballot" made Palm Beach outlier.

Use of Multivariate Statistical Approaches for Decoding Chemical Evolution of Groundwater near Underground Storage Caverns (다변량통계기법을 이용한 지하저장시설 주변의 지하수질 변동에 관한 연구)

  • Lee, Jeonghoon
    • Journal of the Korean earth science society
    • /
    • v.35 no.4
    • /
    • pp.225-236
    • /
    • 2014
  • Multivariate statistical analyses have been extensively applied to hydrochemical measurements to analyze and interpret the data. This study examines anthropogenic factors obtained from applications of correspondence analysis (CA) and principal component analysis (PCA) to a hydrogeochemical data set. The goal was to synthesize the hydrogeochemical information using these multivariate statistical techniques by incorporating hydrogeochemical speciation results calculated by the program, commonly used, WATEQ4F included in the NETPATH. The selected case study was LPG underground storage caverns, which is located in the southeastern Korea. The highly alkaline groundwaters at this study area are an analogue for the repository system. High pH, speciation of Al and possible precipitation of calcite characterize these groundwaters. Available groundwater quality monitoring data were used to confirm these statistical models. The present study focused on understanding the hydrogeochemical attributes and establishing the changes of phase when two anthropogenic effects (i.e., disinfection activity and cement pore water) in the study area have been introduced. Comparisons made between two statistical results presented and the findings of previous investigations highlight the descriptive capabilities of PCA using calculated saturation index and CA as exploratory tools in hydrogeochemical research.

On the second order property of elliptical multivariate regular variation

  • Moosup Kim
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.4
    • /
    • pp.459-466
    • /
    • 2024
  • Multivariate regular variation is a popular framework of multivariate extreme value analysis. However, a suitable parametric model needs to be introduced for efficient estimation of its spectral measure. In such a view, elliptical distributions have been employed for deriving such models. On the other hand, the second order behavior of multivariate regular variation has to be specified for investigating the property of the estimator. This paper derives such a behavior by imposing a widely adopted second order regular variation condition on the representation of elliptical distributions. As result, the second order variation for the convergence to spectral measure is characterized by a signed measure with a regular varying index. Moreover, it leads to the asymptotic bias of the estimator. For demonstration, multivariate t-distribution is considered.

Assessment of Water Quality using Multivariate Statistical Techniques: A Case Study of the Nakdong River Basin, Korea

  • Park, Seongmook;Kazama, Futaba;Lee, Shunhwa
    • Environmental Engineering Research
    • /
    • v.19 no.3
    • /
    • pp.197-203
    • /
    • 2014
  • This study estimated spatial and seasonal variation of water quality to understand characteristics of Nakdong river basin, Korea. All together 11 parameters (discharge, water temperature, dissolved oxygen, 5-day biochemical oxygen demand, chemical oxygen demand, pH, suspended solids, electrical conductivity, total nitrogen, total phosphorus, and total organic carbon) at 22 different sites for the period of 2003-2011 were analyzed using multivariate statistical techniques (cluster analysis, principal component analysis and factor analysis). Hierarchical cluster analysis grouped whole river basin into three zones, i.e., relatively less polluted (LP), medium polluted (MP) and highly polluted (HP) based on similarity of water quality characteristics. The results of factor analysis/principal component analysis explained up to 83.0%, 81.7% and 82.7% of total variance in water quality data of LP, MP, and HP zones, respectively. The rotated components of PCA obtained from factor analysis indicate that the parameters responsible for water quality variations were mainly related to discharge and total pollution loads (non-point pollution source) in LP, MP and HP areas; organic and nutrient pollution in LP and HP zones; and temperature, DO and TN in LP zone. This study demonstrates the usefulness of multivariate statistical techniques for analysis and interpretation of multi-parameter, multi-location and multi-year data sets.

Multivariate analysis of longitudinal surveys for population median

  • Priyanka, Kumari;Mittal, Richa
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.3
    • /
    • pp.255-269
    • /
    • 2017
  • This article explores the analysis of longitudinal surveys in which same units are investigated on several occasions. Multivariate exponential ratio type estimator has been proposed for the estimation of the finite population median at the current occasion in two occasion longitudinal surveys. Information on several additional auxiliary variables, which are stable over time and readily available on both the occasions, has been utilized. Properties of the proposed multivariate estimator, including the optimum replacement strategy, are presented. The proposed multivariate estimator is compared with the sample median estimator when there is no matching from a previous occasion and with the exponential ratio type estimator in successive sampling when information is available on only one additional auxiliary variable. The merits of the proposed estimator are justified by empirical interpretations and validated by a simulation study with the help of some natural populations.

Statistical Methods for Multivariate Missing Data in Health Survey Research (보건조사연구에서 다변량결측치가 내포된 자료를 효율적으로 분석하기 위한 통계학적 방법)

  • Kim, Dong-Kee;Park, Eun-Cheol;Sohn, Myong-Sei;Kim, Han-Joong;Park, Hyung-Uk;Ahn, Chae-Hyung;Lim, Jong-Gun;Song, Ki-Jun
    • Journal of Preventive Medicine and Public Health
    • /
    • v.31 no.4 s.63
    • /
    • pp.875-884
    • /
    • 1998
  • Missing observations are common in medical research and health survey research. Several statistical methods to handle the missing data problem have been proposed. The EM algorithm (Expectation-Maximization algorithm) is one of the ways of efficiently handling the missing data problem based on sufficient statistics. In this paper, we developed statistical models and methods for survey data with multivariate missing observations. Especially, we adopted the EM algorithm to handle the multivariate missing observations. We assume that the multivariate observations follow a multivariate normal distribution, where the mean vector and the covariance matrix are primarily of interest. We applied the proposed statistical method to analyze data from a health survey. The data set we used came from a physician survey on Resource-Based Relative Value Scale(RBRVS). In addition to the EM algorithm, we applied the complete case analysis, which uses only completely observed cases, and the available case analysis, which utilizes all available information. The residual and normal probability plots were evaluated to access the assumption of normality. We found that the residual sum of squares from the EM algorithm was smaller than those of the complete-case and the available-case analyses.

  • PDF

Combining cluster analysis and neural networks for the classification problem

  • Kim, Kyungsup;Han, Ingoo
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 1996.10a
    • /
    • pp.31-34
    • /
    • 1996
  • The extensive researches have compared the performance of neural networks(NN) with those of various statistical techniques for the classification problem. The empirical results of these comparative studies have indicated that the neural networks often outperform the traditional statistical techniques. Moreover, there are some efforts that try to combine various classification methods, especially multivariate discriminant analysis with neural networks. While these efforts improve the performance, there exists a problem violating robust assumptions of multivariate discriminant analysis that are multivariate normality of the independent variables and equality of variance-covariance matrices in each of the groups. On the contrary, cluster analysis alleviates this assumption like neural networks. We propose a new approach to classification problems by combining the cluster analysis with neural networks. The resulting predictions of the composite model are more accurate than each individual technique.

  • PDF

Resistant Singular Value Decomposition and Its Statistical Applications

  • Park, Yong-Seok;Huh, Myung-Hoe
    • Journal of the Korean Statistical Society
    • /
    • v.25 no.1
    • /
    • pp.49-66
    • /
    • 1996
  • The singular value decomposition is one of the most useful methods in the area of matrix computation. It gives dimension reduction which is the centeral idea in many multivariate analyses. But this method is not resistant, i.e., it is very sensitive to small changes in the input data. In this article, we derive the resistant version of singular value decomposition for principal component analysis. And we give its statistical applications to biplot which is similar to principal component analysis in aspects of the dimension reduction of an n x p data matrix. Therefore, we derive the resistant principal component analysis and biplot based on the resistant singular value decomposition. They provide graphical multivariate data analyses relatively little influenced by outlying observations.

  • PDF