Search | Korea Science

Selection of markers in the framework of multivariate receiver operating characteristic curve analysis in binary classification

Sameera, G;Vishnu, Vardhan R
- Communications for Statistical Applications and Methods
- /
- v.26 no.2
- /
- pp.79-89
- /
- 2019
Classification models pertaining to receiver operating characteristic (ROC) curve analysis have been extended from univariate to multivariate setup by linearly combining available multiple markers. One such classification model is the multivariate ROC curve analysis. However, not all markers contribute in a real scenario and may mask the contribution of other markers in classifying the individuals/objects. This paper addresses this issue by developing an algorithm that helps in identifying the important markers that are significant and true contributors. The proposed variable selection framework is supported by real datasets and a simulation study, it is shown to provide insight about the individual marker's significance in providing a classifier rule/linear combination with good extent of classification.
https://doi.org/10.29220/CSAM.2019.26.2.079 인용 PDF KSCI

Residuals Plots for Repeated Measures Data

PARK TAESUNG
- Proceedings of the Korean Statistical Society Conference
- /
- 2000.11a
- /
- pp.187-191
- /
- 2000
In the analysis of repeated measurements, multivariate regression models that account for the correlations among the observations from the same subject are widely used. Like the usual univariate regression models, these multivariate regression models also need some model diagnostic procedures. In this paper, we propose a simple graphical method to detect outliers and to investigate the goodness of model fit in repeated measures data. The graphical method is based on the quantile-quantile(Q-Q) plots of the $X^2$ distribution and the standard normal distribution. We also propose diagnostic measures to detect influential observations. The proposed method is illustrated using two examples.
PDF

Statistical Outliers in Florida Counties at the Presidential Election 2000 (2000년 미국대선 플로리다주의 투표결과 분석)

김현철
- The Korean Journal of Applied Statistics
- /
- v.15 no.1
- /
- pp.21-32
- /
- 2002
We searched out in the votes data of the State of Florida at presidential election 2000. We used a multivariate regression analysis. We got there were several outliers including Palm Beach County. It means that we should analyze the number of disqualified ballots which were double-punched as well as the votes, to insist the " Butterfly Ballot" made Palm Beach outlier.
https://doi.org/10.5351/KJAS.2002.15.1.021 인용 PDF KSCI

Use of Multivariate Statistical Approaches for Decoding Chemical Evolution of Groundwater near Underground Storage Caverns (다변량통계기법을 이용한 지하저장시설 주변의 지하수질 변동에 관한 연구)

Lee, Jeonghoon
- Journal of the Korean earth science society
- /
- v.35 no.4
- /
- pp.225-236
- /
- 2014
Multivariate statistical analyses have been extensively applied to hydrochemical measurements to analyze and interpret the data. This study examines anthropogenic factors obtained from applications of correspondence analysis (CA) and principal component analysis (PCA) to a hydrogeochemical data set. The goal was to synthesize the hydrogeochemical information using these multivariate statistical techniques by incorporating hydrogeochemical speciation results calculated by the program, commonly used, WATEQ4F included in the NETPATH. The selected case study was LPG underground storage caverns, which is located in the southeastern Korea. The highly alkaline groundwaters at this study area are an analogue for the repository system. High pH, speciation of Al and possible precipitation of calcite characterize these groundwaters. Available groundwater quality monitoring data were used to confirm these statistical models. The present study focused on understanding the hydrogeochemical attributes and establishing the changes of phase when two anthropogenic effects (i.e., disinfection activity and cement pore water) in the study area have been introduced. Comparisons made between two statistical results presented and the findings of previous investigations highlight the descriptive capabilities of PCA using calculated saturation index and CA as exploratory tools in hydrogeochemical research.
https://doi.org/10.5467/JKESS.2014.35.4.225 인용 PDF KSCI

On the second order property of elliptical multivariate regular variation

Moosup Kim
- Communications for Statistical Applications and Methods
- /
- v.31 no.4
- /
- pp.459-466
- /
- 2024
Multivariate regular variation is a popular framework of multivariate extreme value analysis. However, a suitable parametric model needs to be introduced for efficient estimation of its spectral measure. In such a view, elliptical distributions have been employed for deriving such models. On the other hand, the second order behavior of multivariate regular variation has to be specified for investigating the property of the estimator. This paper derives such a behavior by imposing a widely adopted second order regular variation condition on the representation of elliptical distributions. As result, the second order variation for the convergence to spectral measure is characterized by a signed measure with a regular varying index. Moreover, it leads to the asymptotic bias of the estimator. For demonstration, multivariate t-distribution is considered.
https://doi.org/10.29220/CSAM.2024.31.4.459 인용 PDF

Assessment of Water Quality using Multivariate Statistical Techniques: A Case Study of the Nakdong River Basin, Korea

Park, Seongmook;Kazama, Futaba;Lee, Shunhwa
- Environmental Engineering Research
- /
- v.19 no.3
- /
- pp.197-203
- /
- 2014
This study estimated spatial and seasonal variation of water quality to understand characteristics of Nakdong river basin, Korea. All together 11 parameters (discharge, water temperature, dissolved oxygen, 5-day biochemical oxygen demand, chemical oxygen demand, pH, suspended solids, electrical conductivity, total nitrogen, total phosphorus, and total organic carbon) at 22 different sites for the period of 2003-2011 were analyzed using multivariate statistical techniques (cluster analysis, principal component analysis and factor analysis). Hierarchical cluster analysis grouped whole river basin into three zones, i.e., relatively less polluted (LP), medium polluted (MP) and highly polluted (HP) based on similarity of water quality characteristics. The results of factor analysis/principal component analysis explained up to 83.0%, 81.7% and 82.7% of total variance in water quality data of LP, MP, and HP zones, respectively. The rotated components of PCA obtained from factor analysis indicate that the parameters responsible for water quality variations were mainly related to discharge and total pollution loads (non-point pollution source) in LP, MP and HP areas; organic and nutrient pollution in LP and HP zones; and temperature, DO and TN in LP zone. This study demonstrates the usefulness of multivariate statistical techniques for analysis and interpretation of multi-parameter, multi-location and multi-year data sets.
https://doi.org/10.4491/eer.2014.008 인용 PDF KSCI

Multivariate analysis of longitudinal surveys for population median

Priyanka, Kumari;Mittal, Richa
- Communications for Statistical Applications and Methods
- /
- v.24 no.3
- /
- pp.255-269
- /
- 2017
This article explores the analysis of longitudinal surveys in which same units are investigated on several occasions. Multivariate exponential ratio type estimator has been proposed for the estimation of the finite population median at the current occasion in two occasion longitudinal surveys. Information on several additional auxiliary variables, which are stable over time and readily available on both the occasions, has been utilized. Properties of the proposed multivariate estimator, including the optimum replacement strategy, are presented. The proposed multivariate estimator is compared with the sample median estimator when there is no matching from a previous occasion and with the exponential ratio type estimator in successive sampling when information is available on only one additional auxiliary variable. The merits of the proposed estimator are justified by empirical interpretations and validated by a simulation study with the help of some natural populations.
https://doi.org/10.5351/CSAM.2017.24.3.255 인용 PDF KSCI

Statistical Methods for Multivariate Missing Data in Health Survey Research (보건조사연구에서 다변량결측치가 내포된 자료를 효율적으로 분석하기 위한 통계학적 방법)

Kim, Dong-Kee;Park, Eun-Cheol;Sohn, Myong-Sei;Kim, Han-Joong;Park, Hyung-Uk;Ahn, Chae-Hyung;Lim, Jong-Gun;Song, Ki-Jun
- Journal of Preventive Medicine and Public Health
- /
- v.31 no.4 s.63
- /
- pp.875-884
- /
- 1998
Missing observations are common in medical research and health survey research. Several statistical methods to handle the missing data problem have been proposed. The EM algorithm (Expectation-Maximization algorithm) is one of the ways of efficiently handling the missing data problem based on sufficient statistics. In this paper, we developed statistical models and methods for survey data with multivariate missing observations. Especially, we adopted the EM algorithm to handle the multivariate missing observations. We assume that the multivariate observations follow a multivariate normal distribution, where the mean vector and the covariance matrix are primarily of interest. We applied the proposed statistical method to analyze data from a health survey. The data set we used came from a physician survey on Resource-Based Relative Value Scale(RBRVS). In addition to the EM algorithm, we applied the complete case analysis, which uses only completely observed cases, and the available case analysis, which utilizes all available information. The residual and normal probability plots were evaluated to access the assumption of normality. We found that the residual sum of squares from the EM algorithm was smaller than those of the complete-case and the available-case analyses.
PDF

Combining cluster analysis and neural networks for the classification problem

Kim, Kyungsup;Han, Ingoo
- Proceedings of the Korean Operations and Management Science Society Conference
- /
- 1996.10a
- /
- pp.31-34
- /
- 1996
The extensive researches have compared the performance of neural networks(NN) with those of various statistical techniques for the classification problem. The empirical results of these comparative studies have indicated that the neural networks often outperform the traditional statistical techniques. Moreover, there are some efforts that try to combine various classification methods, especially multivariate discriminant analysis with neural networks. While these efforts improve the performance, there exists a problem violating robust assumptions of multivariate discriminant analysis that are multivariate normality of the independent variables and equality of variance-covariance matrices in each of the groups. On the contrary, cluster analysis alleviates this assumption like neural networks. We propose a new approach to classification problems by combining the cluster analysis with neural networks. The resulting predictions of the composite model are more accurate than each individual technique.
PDF

Resistant Singular Value Decomposition and Its Statistical Applications

Park, Yong-Seok;Huh, Myung-Hoe
- Journal of the Korean Statistical Society
- /
- v.25 no.1
- /
- pp.49-66
- /
- 1996
The singular value decomposition is one of the most useful methods in the area of matrix computation. It gives dimension reduction which is the centeral idea in many multivariate analyses. But this method is not resistant, i.e., it is very sensitive to small changes in the input data. In this article, we derive the resistant version of singular value decomposition for principal component analysis. And we give its statistical applications to biplot which is similar to principal component analysis in aspects of the dimension reduction of an n x p data matrix. Therefore, we derive the resistant principal component analysis and biplot based on the resistant singular value decomposition. They provide graphical multivariate data analyses relatively little influenced by outlying observations.
PDF

Search Result 637, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)