• Title/Summary/Keyword: multivariate data analysis

Search Result 1,402, Processing Time 0.032 seconds

A Study on High Breakdown Discriminant Analysis : A Monte Carlo Simulation

  • Moon Sup;Young Joo;Youngjo
    • Communications for Statistical Applications and Methods
    • /
    • v.7 no.1
    • /
    • pp.225-232
    • /
    • 2000
  • The linear and quadratic discrimination functions based on normal theory are widely used to classify an observation to one of predefined groups. But the discriminant functions are sensitive to outliers. A high breakdown procedure to estimate location and scatter of multivariate data is the minimum volume ellipsoid or MVE estimator To obtain high breakdown classifiers outliers in multivariate data are detected by using the robust Mahalanobis distance based on MVE estimators and the weighted estimators are inserted in the functions for classification. A samll-sample MOnte Carlo study shows that the high breakdown robust procedures perform better than the classical classifiers.

  • PDF

Penalized least distance estimator in the multivariate regression model (다변량 선형회귀모형의 벌점화 최소거리추정에 관한 연구)

  • Jungmin Shin;Jongkyeong Kang;Sungwan Bang
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.1
    • /
    • pp.1-12
    • /
    • 2024
  • In many real-world data, multiple response variables are often dependent on the same set of explanatory variables. In particular, if several response variables are correlated with each other, simultaneous estimation considering the correlation between response variables might be more effective way than individual analysis by each response variable. In this multivariate regression analysis, least distance estimator (LDE) can estimate the regression coefficients simultaneously to minimize the distance between each training data and the estimates in a multidimensional Euclidean space. It provides a robustness for the outliers as well. In this paper, we examine the least distance estimation method in multivariate linear regression analysis, and furthermore, we present the penalized least distance estimator (PLDE) for efficient variable selection. The LDE technique applied with the adaptive group LASSO penalty term (AGLDE) is proposed in this study which can reflect the correlation between response variables in the model and can efficiently select variables according to the importance of explanatory variables. The validity of the proposed method was confirmed through simulations and real data analysis.

Genetic parameters for worm resistance in Santa Inês sheep using the Bayesian animal model

  • Rodrigues, Francelino Neiva;Sarmento, Jose Lindenberg Rocha;Leal, Tania Maria;de Araujo, Adriana Mello;Filho, Luiz Antonio Silva Figueiredo
    • Animal Bioscience
    • /
    • v.34 no.2
    • /
    • pp.185-191
    • /
    • 2021
  • Objective: The objective of this study was to estimate the genetic parameters for worm resistance (WR) and associated characteristics, using the linear-threshold animal model via Bayesian inference in single- and multiple-trait analyses. Methods: Data were collected from a herd of Santa Inês breed sheep. All information was collected with animals submitted to natural contamination conditions. All data (number of eggs per gram of feces [FEC], Famacha score [FS], body condition score [BCS], and hematocrit [HCT]) were collected on the same day. The animals were weighed individually on the day after collection (after 12-h fasting). The WR trait was defined by the multivariate cluster analysis, using the FEC, HCT, BCS, and FS of material collected from naturally infected sheep of the Santa Inês breed. The variance components and genetic parameters for the WR, FEC, HCT, BCS, and FS traits were estimated using the Bayesian inference under the linear and threshold animal model. Results: A low magnitude was obtained for repeatability of worm-related traits. The mean values estimated for heritability were of low-to-high (0.05 to 0.88) magnitude. The FEC, HCT, BCS, FS, and body weight traits showed higher heritability (although low magnitude) in the multiple-trait model due to increased information about traits. All WR characters showed a significant genetic correlation, and heritability estimates ranged from low (0.44; single-trait model) to high (0.88; multiple-trait model). Conclusion: Therefore, we suggest that FS be included as a criterion of ovine genetic selection for endoparasite resistance using the trait defined by multivariate cluster analysis, as it will provide greater genetic gains when compared to any single trait. In addition, its measurement is easy and inexpensive, exhibiting greater heritability and repeatability and a high genetic correlation with the trait of resistance to worms.

Correlation analysis of human urinary metabolites related to gender and obesity using NMR-based metabolic profiling

  • Kim, Ja-Han;Park, Jung-Dae;Park, Sung-Soo;Hwang, Geum-Sook
    • Journal of the Korean Magnetic Resonance Society
    • /
    • v.16 no.1
    • /
    • pp.46-66
    • /
    • 2012
  • Metabolomic studies using human urine have shown that human metabolism is altered by a variety of environmental, cultural, and physiological factors. Comprehensive information about normal human metabolite profiles is necessary for accurate clinical diagnosis of disease and for disease prevention and treatment. In this study, metabolite correlation analyses, using $^1H$ nuclear magnetic resonance (NMR) spectroscopy coupled with multivariate statistics, were performed on human urine to compare metabolic differences based on gender and/or obesity in healthy human subjects. First, we applied partial least squares discriminant analysis to the NMR spectral data set to verify the data's ability to discriminate by gender and obesity. Then, the differences in metabolite-metabolite correlation between male and female, and between normal and high body mass index (obese) subjects were investigated through pairwise correlations. Creatine and several metabolites, including isoleucine, trans-aconitate, and trimethylamine N-oxide (TMAO), exhibited different quantitative relationships depending on gender. Dimethylamine had a different correlation with glycine and TMAO, based on gender. The correlation of TMAO with amino acids was considerably lower in obese, compared to normal, subjects. We expect that the results will shed light on the metabolic pathways of healthy humans and will assist in the accurate diagnosis of human disease.

Low Income and Rural County of Residence Increase Mortality from Bone and Joint Sarcomas

  • Cheung, Min Rex
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.14 no.9
    • /
    • pp.5043-5047
    • /
    • 2013
  • Background: This is a part of a larger effort to characterize the effects on socio-economic factors (SEFs) on cancer outcome. Surveillance, Epidemiology and End Result (SEER) bone and joint sarcoma (BJS) data were used to identify potential disparities in cause specific survival (CSS). Materials and Methods: This study analyzed SEFs in conjunction with biologic and treatment factors. Absolute BJS specific risks were calculated and the areas under the receiver operating characteristic (ROC) curve were computed for predictors. Actuarial survival analysis was performed with Kaplan-Meier method. Kolmogorov-Smirnov's 2-sample test was used to for comparing two survival curves. Cox proportional hazard model was used for multivariate analysis. Results: There were 13501 patients diagnosed BJS from 1973 to 2009. The mean follow up time (SD) was 75.6 (90.1) months. Staging was the highest predictive factor of outcome (ROC area of 0.68). SEER stage, histology, primary site and sex were highly significant pre-treatment predictors of CSS. Under multivariate analysis, patients living in low income neighborhoods and rural areas had a 2% and 5% disadvantage in cause specific survival respectively. Conclusions: This study has found 2-5% decrement of CSS of BJS due to SEFs. These data may be used to generate testable hypothesis for future clinical trials to eliminate BJS outcome disparities.

APPLICATION OF MULTIVARIATE DISCRIMINANT ANALYSIS FOR CLASSIFYING PROFICIENCY OF EQUIPMENT OPERATORS

  • Ruel R. Cabahug;Ruth Guinita-Cabahug;David J. Edwards
    • International conference on construction engineering and project management
    • /
    • 2005.10a
    • /
    • pp.662-666
    • /
    • 2005
  • Using data gathered from expert opinion of plant and equipment professionals; this paper presents the key variables that may constitute a maintenance proficient plant operator. The Multivariate Discriminant Analysis (MDA) was applied to generate data and was tested for sensitivity analysis. Results showed that the MDA model was able to classify plant operators' proficiency at 94.10 percent accuracy and determined nine (9) key variables of a maintenance proficient plant operator. The key variables included: i) number of years of experience as equipment operator (PQ1); ii) eye-hand coordination (PQ9); iii) eye-hand-foot coordination (PQ10); iv) planning skills (TE16); v) pay/wage (MQ1); vi) work satisfaction (MQ4); vii) operator responsibilities as defined by management (MF1); viii) clear management policies (MF4); and ix) management pay scheme (MF5). The classification procedure of nine variables formed the general model with the equation viz: OMP (general) = 0.516PQ1 + 0.309PQ9 + 0.557PQ10 + 0.831TE16 + 0.8MQ1 + 0.0216MQ4 + 0.136MF1 + 0.28MF4 + 0.332MF5 - 4.387

  • PDF

Investigating the performance of different decomposition methods in rainfall prediction from LightGBM algorithm

  • Narimani, Roya;Jun, Changhyun;Nezhad, Somayeh Moghimi;Parisouj, Peiman
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2022.05a
    • /
    • pp.150-150
    • /
    • 2022
  • This study investigates the roles of decomposition methods on high accuracy in daily rainfall prediction from light gradient boosting machine (LightGBM) algorithm. Here, empirical mode decomposition (EMD) and singular spectrum analysis (SSA) methods were considered to decompose and reconstruct input time series into trend terms, fluctuating terms, and noise components. The decomposed time series from EMD and SSA methods were used as input data for LightGBM algorithm in two hybrid models, including empirical mode-based light gradient boosting machine (EMDGBM) and singular spectrum analysis-based light gradient boosting machine (SSAGBM), respectively. A total of four parameters (i.e., temperature, humidity, wind speed, and rainfall) at a daily scale from 2003 to 2017 is used as input data for daily rainfall prediction. As results from statistical performance indicators, it indicates that the SSAGBM model shows a better performance than the EMDGBM model and the original LightGBM algorithm with no decomposition methods. It represents that the accuracy of LightGBM algorithm in rainfall prediction was improved with the SSA method when using multivariate dataset.

  • PDF

중성자 방사화분석에 의한 한국자기의 분류

  • Gang, Hyeong-Tae;Lee, Cheol
    • 보존과학연구
    • /
    • s.6
    • /
    • pp.111-120
    • /
    • 1985
  • Data on the concentration of Na, K, Sc, Cr, Fe, Co, Cu, Ga, Rb, Cs, Ba, La,Ce, Sm, Eu, Tb, Lu, Hf, Ta and Th obtained by Neutron Activation Analysishave been used to characterise Korean porcelainsherds by multivariate analysis. The mathematical approaches employed is Principal Component Analysis(PCA).PCA was found to be helpful for dimensionality reduction and for obtaining information regarding (a) the number of independent causal variables required to account for the variability in the overall data set, (b) the extent to which agiven variable contributes to a component and(c) the number of causalvariables required to explain the total variability of each measured variable.

  • PDF

Statistical Discriminant Analysis on the Driving Ability of the Brain-injured

  • Kim, Jae-Hee;Kim, Jeong-A
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.1
    • /
    • pp.19-31
    • /
    • 2005
  • Brain injured patients who had the driver's license before the injury of the brain were tested with the newly developed tool CPAD by Hangyang Medical School and the National Rehabilitation Center. The CPAD contains many variables to measure the ability of driving. Also for each patient the American standard CBDI score was measured and the result was compared with the CPAD results. Of interest is to classify the patients as pass, border, fail group after the CPAD test. To derive the discriminant functions with the group information based on CBDI, parametric/nonparametric and multivariate/univariate discriminant analysis was performed and discussed.

  • PDF

The study of the Gifted Students Education about Doing Mathematical Task with the Face Plot (얼굴그림(Face Plot)을 활용한 수학영재교육의 사례연구)

  • Kim, Yunghwan
    • Journal of the Korean School Mathematics Society
    • /
    • v.20 no.4
    • /
    • pp.369-385
    • /
    • 2017
  • This study is to figure out the activity and disposition of gifted students with face plot in exploratory data analysis at middle school mathematics class. This study has begun on the basis of the doing mathematics at multivariate analysis beyond one variable and two variables. Gifted students were developed the good learning habits theirselves. According to this result, Many gifted students have an interesting experience at data analysis with Face Plot. And they felt the useful methods of creative thinking about graphics with doing mathematics at mathematical tasks. I think that teachers need to learn the visualization methods and to make and to develop the STEAM education tasks connected real life. It should be effective enough to change their attitudes toward teaching and learning at exploratory data analysis.

  • PDF