• 제목/요약/키워드: multivariate data analysis

검색결과 1,405건 처리시간 0.032초

MBRDR: R-package for response dimension reduction in multivariate regression

  • Heesung Ahn;Jae Keun Yoo
    • Communications for Statistical Applications and Methods
    • /
    • 제31권2호
    • /
    • pp.179-189
    • /
    • 2024
  • In multivariate regression with a high-dimensional response Y ∈ ℝr and a relatively low-dimensional predictor X ∈ ℝp (where r ≥ 2), the statistical analysis of such data presents significant challenges due to the exponential increase in the number of parameters as the dimension of the response grows. Most existing dimension reduction techniques primarily focus on reducing the dimension of the predictors (X), not the dimension of the response variable (Y). Yoo and Cook (2008) introduced a response dimension reduction method that preserves information about the conditional mean E(Y | X). Building upon this foundational work, Yoo (2018) proposed two semi-parametric methods, principal response reduction (PRR) and principal fitted response reduction (PFRR), then expanded these methods to unstructured principal fitted response reduction (UPFRR) (Yoo, 2019). This paper reviews these four response dimension reduction methodologies mentioned above. In addition, it introduces the implementation of the mbrdr package in R. The mbrdr is a unique tool in the R community, as it is specifically designed for response dimension reduction, setting it apart from existing dimension reduction packages that focus solely on predictors.

다변량 분석을 이용한 권역별 대표확률강우강도식의 유도 (A Derivation of Regional Representative Intensity-Duration-Frequency Relationship Using Multivariate Analysis)

  • 이정식;조성근;장진욱
    • 한국방재학회 논문집
    • /
    • 제7권2호통권25호
    • /
    • pp.13-24
    • /
    • 2007
  • 본 연구에서는 우리나라 강우에 다변량 분석기법을 적용하여 대표확률분포형을 결정하고, 결정된 대표확률분포형으로부터 확률강우강도식을 유도하였다. 강우자료는 30년 이상의 연최대강우자료로서 12개의 지속기간(10분, 1, 2, 3, 4, 5, 6, 8, 10, 12, 18, 24시간)과 50개의 강우특성인자를 적용하였다. 확률분포형은 빈도해석에 널리 사용되는 14개 분포형을 사용하였으며, 전 지역의 강우 동질성을 검정하는 방법으로 주성분분석과 군집분석을 실시하였다. 본 연구의 수행으로 얻어진 결과를 하면 다음과 같다. 첫째, 우리나라 전역을 대표할 수 있는 적정분포형을 선정할 수는 없었으나, 수문학적 동질성이 인정되는 5개의 권역으로 구분하였다. 둘째, I, III, IV, V 권역은 GEV 분포, I I권역은 Gumbel 분포가 대표적정분포형으로 선정되었다. 셋째, 대표적정분포형에 의한 확률강우량은 기존 연구들과 차이가 발생하는 것을 알 수 있었다. 넷째, 대표적정분포형으로부터 얻어진 확률강우량을 이용하여 대표확률강우강도식을 유도하였다.

Non-Destructive Sorting Techniques for Viable Pepper (Capsicum annuum L.) Seeds Using Fourier Transform Near-Infrared and Raman Spectroscopy

  • Seo, Young-Wook;Ahn, Chi Kook;Lee, Hoonsoo;Park, Eunsoo;Mo, Changyeun;Cho, Byoung-Kwan
    • Journal of Biosystems Engineering
    • /
    • 제41권1호
    • /
    • pp.51-59
    • /
    • 2016
  • Purpose: This study examined the performance of two spectroscopy methods and multivariate classification methods to discriminate viable pepper seeds from their non-viable counterparts. Methods: A classification model for viable seeds was developed using partial least square discrimination analysis (PLS-DA) with Fourier transform near-infrared (FT-NIR) and Raman spectroscopic data in the range of $9080-4150cm^{-1}$ (1400-2400 nm) and $1800-970cm^{-1}$, respectively. The datasets were divided into 70% to calibration and 30% to validation. To reduce noise from the spectra and compare the classification results, preprocessing methods, such as mean, maximum, and range normalization, multivariate scattering correction, standard normal variate, and $1^{st}$ and $2^{nd}$ derivatives with the Savitzky-Golay algorithm were used. Results: The classification accuracies for calibration using FT-NIR and Raman spectroscopy were both 99% with first derivative, whereas the validation accuracies were 90.5% with both multivariate scattering correction and standard normal variate, and 96.4% with the raw data (non-preprocessed data). Conclusions: These results indicate that FT-NIR and Raman spectroscopy are valuable tools for a feasible classification and evaluation of viable pepper seeds by providing useful information based on PLS-DA and the threshold value.

Racial and Socioeconomic Disparities in Malignant Carcinoid Cancer Cause Specific Survival: Analysis of the Surveillance, Epidemiology and End Results National Cancer Registry

  • Cheung, Rex
    • Asian Pacific Journal of Cancer Prevention
    • /
    • 제14권12호
    • /
    • pp.7117-7120
    • /
    • 2013
  • Background: This study hypothesized living in a poor neighborhood decreased the cause specific survival in individuals suffering from carcinoid carcinomas. Surveillance, Epidemiology and End Results (SEER) carcinoid carcinoma data were used to identify potential socioeconomic disparities in outcome. Materials and Methods: This study analyzed socioeconomic, staging and treatment factors available in the SEER database for carcinoid carcinomas. The Kaplan-Meier method was used to analyze time to events and the Kolmogorov-Smirnov test to compare survival curves. The Cox proportional hazard method was employed for multivariate analysis. Areas under the receiver operating characteristic curves (ROCs) were computed to screen the predictors for further analysis. Results: There were 38,546 patients diagnosed from 1973 to 2009 included in this study. The mean follow up time (S.D.) was 68.1 (70.7) months. SEER stage was the most predictive factor of outcome (ROC area of 0.79). 16.4% of patients were un-staged. Race/ethnicity, rural urban residence and county level family income were significant predictors of cause specific survival on multivariate analysis, these accounting for about 5% of the difference in actuarial cause specific survival at 20 years of follow up. Conclusions: This study found poorer cause specific survival of carcinoid carcinomas of individuals living in poor and rural neighborhoods.

A predictive model for compressive strength of waste LCD glass concrete by nonlinear-multivariate regression

  • Wang, C.C.;Chen, T.T.;Wang, H.Y.;Huang, Chi
    • Computers and Concrete
    • /
    • 제13권4호
    • /
    • pp.531-545
    • /
    • 2014
  • The purpose of this paper is to develop a prediction model for the compressive strength of waste LCD glass applied in concrete by analyzing a series of laboratory test results, which were obtained in our previous study. The hyperbolic function was used to perform the nonlinear-multivariate regression analysis of the compressive strength prediction model with the following parameters: water-binder ratio w/b, curing age t, and waste glass content G. According to the relative regression analysis, the compressive strength prediction model is developed. The calculated results are in accord with the laboratory measured data, which are the concrete compressive strengths of different mix proportions. In addition, a coefficient of determination $R^2$ value between 0.93 and 0.96 and a mean absolute percentage error MAPE between 5.4% and 8.4% were obtained by regression analysis using the predicted compressive analysis value, and the test results are also excellent. Therefore, the predicted results for compressive strength are highly accurate for waste LCD glass applied in concrete. Additionally, this predicted model exhibits a good predictive capacity when employed to calculate the compressive strength of washed glass sand concrete.

효율적인 신용평가를 위한 데이터마이닝 모형의 비교.분석에 관한 연구 (Study on the Comparison and Analysis of Data Mining Models for the Efficient Customer Credit Evaluation)

  • 김갑식
    • Journal of Information Technology Applications and Management
    • /
    • 제11권1호
    • /
    • pp.161-174
    • /
    • 2004
  • This study is intended to suggest1 the optimized data mining model for the efficient customer credit evaluation in the capital finance industry. To accomplish the research objective, various data mining models for the customer credit evaluation are compared and analyzed. Furthermore, existing models such as Multi-Layered Perceptrons, Multivariate Discrimination Analysis, Radial Basis Function, Decision Tree, and Logistic Regression are employed for analyzing the customer information in the capital finance market and the detailed data of capital financing transactions. Finally, the data from the integrated model utilizing a genetic algorithm is compared with those of each individual model mentioned above. The results reveals that the integrated model is superior to other existing models.

  • PDF

다변량기법을 활용한 용담호 수질측정지점 유사성 연구 (A Study on Measuring the Similarity Among Sampling Sites in Lake Yongdam with Water Quality Data Using Multivariate Techniques)

  • 이요상;권세혁
    • 환경영향평가
    • /
    • 제18권6호
    • /
    • pp.401-409
    • /
    • 2009
  • Multivariate statistical approaches to classify sampling sites with measuring their similarity by water quality data and understand the characteristics of classified clusters have been discussed for the optimal water quality monitering network. For empirical study, data of two years (2005, 2006) at the 9 sampling sites with the combination of 2 depth levels and 7 important variables related to water quality is collected in Yongdam reservoir. The similarity among sampling sites is measured with Euclidean distances of water quality related variables and they are classified by hierarchical clustering method. The clustered sites are discussed with principal component variables in the view of the geographical characteristics of them and reducing the number of measuring sites. Nine sampling sites are clustered as follows; One cluster of 5, 6, and 7 sampling sites shows the characteristic of low water depth and main stream of water. The sites of 2 and 4 are clustered into the same group by characteristics of hydraulics which come from that of main stream. But their changing pattern of water quality looks like different since the site of 2 is near to dam. The sampling sites of 3, 8, and 9 are individually positioned due to the different tributary.

Prediction of the compressive strength of self-compacting concrete using surrogate models

  • Asteris, Panagiotis G.;Ashrafian, Ali;Rezaie-Balf, Mohammad
    • Computers and Concrete
    • /
    • 제24권2호
    • /
    • pp.137-150
    • /
    • 2019
  • In this paper, surrogate models such as multivariate adaptive regression splines (MARS) and M5P model tree (M5P MT) methods have been investigated in order to propose a new formulation for the 28-days compressive strength of self-compacting concrete (SCC) incorporating metakaolin as a supplementary cementitious materials. A database comprising experimental data has been assembled from several published papers in the literature and the data have been used for training and testing. In particular, the data are arranged in a format of seven input parameters covering contents of cement, coarse aggregate to fine aggregate ratio, water, metakaolin, super plasticizer, largest maximum size and binder as well as one output parameter, which is the 28-days compressive strength. The efficiency of the proposed techniques has been demonstrated by means of certain statistical criteria. The findings have been compared to experimental results and their comparisons shows that the MARS and M5P MT approaches predict the compressive strength of SCC incorporating metakaolin with great precision. The performed sensitivity analysis to assign effective parameters on 28-days compressive strength indicates that cementitious binder content is the most effective variable in the mixture.

고혈압 예측을 위한 이상치 탐지 알고리즘 및 데이터 통합 기법 (An Outlier Detection Algorithm and Data Integration Technique for Prediction of Hypertension)

  • 홍고르출;김미혜 ;송미화
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2023년도 춘계학술발표대회
    • /
    • pp.417-419
    • /
    • 2023
  • Hypertension is one of the leading causes of mortality worldwide. In recent years, the incidence of hypertension has increased dramatically, not only among the elderly but also among young people. In this regard, the use of machine-learning methods to diagnose the causes of hypertension has increased in recent years. In this study, we improved the prediction of hypertension detection using Mahalanobis distance-based multivariate outlier removal using the KNHANES database from the Korean national health data and the COVID-19 dataset from Kaggle. This study was divided into two modules. Initially, the data preprocessing step used merged datasets and decision-tree classifier-based feature selection. The next module applies a predictive analysis step to remove multivariate outliers using the Mahalanobis distance from the experimental dataset and makes a prediction of hypertension. In this study, we compared the accuracy of each classification model. The best results showed that the proposed MAH_RF algorithm had an accuracy of 82.66%. The proposed method can be used not only for hypertension but also for the detection of various diseases such as stroke and cardiovascular disease.

Predisposing, Enabling, and Reinforcing Factors of COVID-19 Prevention Behavior in Indonesia: A Mixed-methods Study

  • Putri Winda Lestari;Lina Agestika;Gusti Kumala Dewi
    • Journal of Preventive Medicine and Public Health
    • /
    • 제56권1호
    • /
    • pp.21-30
    • /
    • 2023
  • Objectives: To prevent the spread of coronavirus disease 2019 (COVID-19), behaviors such as mask-wearing, social distancing, decreasing mobility, and avoiding crowds have been suggested, especially in high-risk countries such as Indonesia. Unfortunately, the level of compliance with those practices has been low. This study was conducted to determine the predisposing, enabling, and reinforcing factors of COVID-19 prevention behavior in Indonesia. Methods: This cross-sectional study used a mixed-methods approach. The participants were 264 adults from 21 provinces in Indonesia recruited through convenience sampling. Data were collected using a Google Form and in-depth interviews. Statistical analysis included univariate, bivariate, and multivariate logistic regression. Furthermore, qualitative data analysis was done through content analysis and qualitative data management using Atlas.ti software. Results: Overall, 44.32% of respondents were non-compliant with recommended COVID-19 prevention behaviors. In multivariate logistic regression analysis, low-to-medium education level, poor attitude, insufficient involvement of leaders, and insufficient regulation were also associated with decreased community compliance. Based on in-depth interviews with informants, the negligence of the Indonesian government in the initial stages of the COVID-19 pandemic may have contributed to the unpreparedness of the community to face the pandemic, as people were not aware of the importance of preventive practices. Conclusions: Education level is not the only factor influencing community compliance with recommended COVID-19 prevention behaviors. Changing attitudes through health promotion to increase public awareness and encouraging voluntary community participation through active risk communication are necessary. Regulations and role leaders are also required to improve COVID-19 prevention behavior.