• Title/Summary/Keyword: principal component regression

Search Result 253, Processing Time 0.025 seconds

A Study on Predictive Models based on the Machine Learning for Evaluating the Extent of Hazardous Zone of Explosive Gases (기계학습 기반의 가스폭발위험범위 예측모델에 관한 연구)

  • Jung, Yong Jae;Lee, Chang Jun
    • Korean Chemical Engineering Research
    • /
    • v.58 no.2
    • /
    • pp.248-256
    • /
    • 2020
  • In this study, predictive models based on machine learning for evaluating the extent of hazardous zone of explosive gases are developed. They are able to provide important guidelines for installing the explosion proof apparatus. 1,200 research data sets including 12 combustible gases and their extents of hazardous zone are generated to train predictive models. The extent of hazardous zone is set to an output variable and 12 variables affecting an output are set as input variables. Multiple linear regression, principal component regression, and artificial neural network are employed to train predictive models. Mean absolute percentage errors of multiple linear regression, principal component regression, and artificial neural network are 44.2%, 49.3%, and 5.7% and root mean square errors are 1.389m, 1.602m, and 0.203 m respectively. Therefore, it can be concluded that the artificial neural network shows the best performance. This model can be easily used to evaluate the extent of hazardous zone for explosive gases.

MOISTURE CONTENT MEASUREMENT OF POWDERED FOOD USING RF IMPEDANCE SPECTROSCOPIC METHOD

  • Kim, K. B.;Lee, J. W.;S. H. Noh;Lee, S. S.
    • Proceedings of the Korean Society for Agricultural Machinery Conference
    • /
    • 2000.11b
    • /
    • pp.188-195
    • /
    • 2000
  • This study was conducted to measure the moisture content of powdered food using RF impedance spectroscopic method. In frequency range of 1.0 to 30㎒, the impedance such as reactance and resistance of parallel plate type sample holder filled with wheat flour and red-pepper powder of which moisture content range were 5.93∼-17.07%w.b. and 10.87 ∼ 27.36%w.b., respectively, was characterized using by Q-meter (HP4342). The reactance was a better parameter than the resistance in estimating the moisture density defined as product of moisture content and bulk density which was used to eliminate the effect of bulk density on RF spectral data in this study. Multivariate data analyses such as principal component regression, partial least square regression and multiple linear regression were performed to develop one calibration model having moisture density and reactance spectral data as parameters for determination of moisture content of both wheat flour and red-pepper powder. The best regression model was one by the multiple linear regression model. Its performance for unknown data of powdered food was showed that the bias, standard error of prediction and determination coefficient are 0.179% moisture content, 1.679% moisture content and 0.8849, respectively.

  • PDF

Development of Regression Models Resolving High-Dimensional Data and Multicollinearity Problem for Heavy Rain Damage Data (호우피해자료에서의 고차원 자료 및 다중공선성 문제를 해소한 회귀모형 개발)

  • Kim, Jeonghwan;Park, Jihyun;Choi, Changhyun;Kim, Hung Soo
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.38 no.6
    • /
    • pp.801-808
    • /
    • 2018
  • The learning of the linear regression model is stable on the assumption that the sample size is sufficiently larger than the number of explanatory variables and there is no serious multicollinearity between explanatory variables. In this study, we investigated the difficulty of model learning when the assumption was violated by analyzing a real heavy rain damage data and we proposed to use a principal component regression model or a ridge regression model after integrating data to overcome the difficulty. We evaluated the predictive performance of the proposed models by using the test data independent from the training data, and confirmed that the proposed methods showed better predictive performances than the linear regression model.

Bayesian inference of the cumulative logistic principal component regression models

  • Kyung, Minjung
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.2
    • /
    • pp.203-223
    • /
    • 2022
  • We propose a Bayesian approach to cumulative logistic regression model for the ordinal response based on the orthogonal principal components via singular value decomposition considering the multicollinearity among predictors. The advantage of the suggested method is considering dimension reduction and parameter estimation simultaneously. To evaluate the performance of the proposed model we conduct a simulation study with considering a high-dimensional and highly correlated explanatory matrix. Also, we fit the suggested method to a real data concerning sprout- and scab-damaged kernels of wheat and compare it to EM based proportional-odds logistic regression model. Compared to EM based methods, we argue that the proposed model works better for the highly correlated high-dimensional data with providing parameter estimates and provides good predictions.

SVM-Guided Biplot of Observations and Variables

  • Huh, Myung-Hoe
    • Communications for Statistical Applications and Methods
    • /
    • v.20 no.6
    • /
    • pp.491-498
    • /
    • 2013
  • We consider support vector machines(SVM) to predict Y with p numerical variables $X_1$, ${\ldots}$, $X_p$. This paper aims to build a biplot of p explanatory variables, in which the first dimension indicates the direction of SVM classification and/or regression fits. We use the geometric scheme of kernel principal component analysis adapted to map n observations on the two-dimensional projection plane of which one axis is determined by a SVM model a priori.

Suggestion of batter ability index in Korea baseball - focusing on the sabermetrics statistics WAR (한국프로야구에서 타자능력지수 제안 - 대체선수대비승수(WAR)을 중심으로)

  • Lee, Jea-Young;Kim, Hyeon-Gyu
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.7
    • /
    • pp.1271-1281
    • /
    • 2016
  • Wins above replacement (WAR) is one of the most widely used statistic among sabermatrics statistics that measure the ability of a batter in baseball. WAR has a great advantage that is to represent the attack power of the player and the base running ability, defensive ability as a single value. In this study, we proposed a hitter ability index using the sabermetrics statistics that can replace WAR based on Korea Baseball Record Data of the last three years (2013-2015). First, we calculated Batter ability index through the arithmetic mean method, the weighted average method, principal component regression and selected the method that had high correlation with WAR.

LMS and LTS-type Alternatives to Classical Principal Component Analysis

  • Huh, Myung-Hoe;Lee, Yong-Goo
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.2
    • /
    • pp.233-241
    • /
    • 2006
  • Classical principal component analysis (PCA) can be formulated as finding the linear subspace that best accommodates multidimensional data points in the sense that the sum of squared residual distances is minimized. As alternatives to such LS (least squares) fitting approach, we produce LMS (least median of squares) and LTS (least trimmed squares)-type PCA by minimizing the median of squared residual distances and the trimmed sum of squares, in a similar fashion to Rousseeuw (1984)'s alternative approaches to LS linear regression. Proposed methods adopt the data-driven optimization algorithm of Croux and Ruiz-Gazen (1996, 2005) that is conceptually simple and computationally practical. Numerical examples are given.

Prediction of Melting Point for Drug-like Compounds Using Principal Component-Genetic Algorithm-Artificial Neural Network

  • Habibi-Yangjeh, Aziz;Pourbasheer, Eslam;Danandeh-Jenagharad, Mohammad
    • Bulletin of the Korean Chemical Society
    • /
    • v.29 no.4
    • /
    • pp.833-841
    • /
    • 2008
  • Principal component-genetic algorithm-multiparameter linear regression (PC-GA-MLR) and principal component-genetic algorithm-artificial neural network (PC-GA-ANN) models were applied for prediction of melting point for 323 drug-like compounds. A large number of theoretical descriptors were calculated for each compound. The first 234 principal components (PC’s) were found to explain more than 99.9% of variances in the original data matrix. From the pool of these PC’s, the genetic algorithm was employed for selection of the best set of extracted PC’s for PC-MLR and PC-ANN models. The models were generated using fifteen PC’s as variables. For evaluation of the predictive power of the models, melting points of 64 compounds in the prediction set were calculated. Root-mean square errors (RMSE) for PC-GA-MLR and PC-GA-ANN models are 48.18 and $12.77{^{\circ}C}$, respectively. Comparison of the results obtained by the models reveals superiority of the PC-GA-ANN relative to the PC-GA-MLR and the recently proposed models (RMSE = $40.7{^{\circ}C}$). The improvements are due to the fact that the melting point of the compounds demonstrates non-linear correlations with the principal components.

Performance Improvement of General Regression Neural Network Using Principal Component Analysis (주요성분분석에 의한 일반회귀 신경망의 성능개선)

  • Cho, Yong-Hyun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.11
    • /
    • pp.3408-3416
    • /
    • 2000
  • This paper proposes an efficient method for improving the performance of a general regression neural network by using the feature to the independent variables as the center for partern-layer neurons. The adaptive principal component analysis is applied for extracting, efficiently the fcarures by reducing the dimension of given independent variables. In can acluevc a supertor property of the principal component analysis that converts input data into set of statistically independent features and the general regression neuralnetwork, espedtively. The proposed general regression neural network has been applied to regress the Solow's economy(2-independent variable set) and the wie elephone(1-independent vanable set). The simulation results show that the proposed meural networks have better performances of the regressionfor the lest data, in comparison with those using the means or the weighted means of independent variables. Also,it is affected less by the number of neurons and the scope of the smoothing factor.

  • PDF

Design of Regression Model and Pattern Classifier by Using Principal Component Analysis (주성분 분석법을 이용한 회귀다항식 기반 모델 및 패턴 분류기 설계)

  • Roh, Seok-Beom;Lee, Dong-Yoon
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.10 no.6
    • /
    • pp.594-600
    • /
    • 2017
  • The new design methodology of prediction model and pattern classification, which is based on the dimension reduction algorithm called principal component analysis, is introduced in this paper. Principal component analysis is one of dimension reduction techniques which are used to reduce the dimension of the input space and extract some good features from the original input variables. The extracted input variables are applied to the prediction model and pattern classifier as the input variables. The introduced prediction model and pattern classifier are based on the very simple regression which is the key point of the paper. The structural simplicity of the prediction model and pattern classifier leads to reducing the over-fitting problem. In order to validate the proposed prediction model and pattern classifier, several machine learning data sets are used.