• Title/Summary/Keyword: Principal Components Regression

Search Result 84, Processing Time 0.023 seconds

Principal Components Regression in Logistic Model (로지스틱모형에서의 주성분회귀)

  • Kim, Bu-Yong;Kahng, Myung-Wook
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.4
    • /
    • pp.571-580
    • /
    • 2008
  • The logistic regression analysis is widely used in the area of customer relationship management and credit risk management. It is well known that the maximum likelihood estimation is not appropriate when multicollinearity exists among the regressors. Thus we propose the logistic principal components regression to deal with the multicollinearity problem. In particular, new method is suggested to select proper principal components. The selection method is based on the condition index instead of the eigenvalue. When a condition index is larger than the upper limit of cutoff value, principal component corresponding to the index is removed from the estimation. And hypothesis test is sequentially employed to eliminate the principal component when a condition index is between the upper limit and the lower limit. The limits are obtained by a linear model which is constructed on the basis of the conjoint analysis. The proposed method is evaluated by means of the variance of the estimates and the correct classification rate. The results indicate that the proposed method is superior to the existing method in terms of efficiency and goodness of fit.

Regional Geological Mapping by Principal Component Analysis of the Landsat TM Data in a Heavily Vegetated Area (식생이 무성한 지역에서의 Principal Component Analysis 에 의한 Landsat TM 자료의 광역지질도 작성)

  • 朴鍾南;徐延熙
    • Korean Journal of Remote Sensing
    • /
    • v.4 no.1
    • /
    • pp.49-60
    • /
    • 1988
  • Principal Component Analysis (PCA) was applied for regional geological mapping to a multivariate data set of the Landsat TM data in the heavily vegetated and topographically rugged Chungju area. The multivariate data set selection was made by statistical analysis based on the magnitude of regression of squares in multiple regression, and it includes R1/2/R3/4, R2/3, R5/7/R4/3, R1/2, R3/4. R4/3. AND R4/5. As a result of application of PCA, some of later principal components (in this study PC 3 and PC 5) are geologically more significant than earlier major components, PC 1 and PC 2 herein. The earlier two major components which comprise 96% of the total information of the data set, mainly represent reflectance of vegetation and topographic effects, while though the rest represent 3% of the total information which statistically indicates the information unstable, geological significance of PC3 and PC5 in the study implies that application of the technique in more favorable areas should lead to much better results.

Bayesian inference of the cumulative logistic principal component regression models

  • Kyung, Minjung
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.2
    • /
    • pp.203-223
    • /
    • 2022
  • We propose a Bayesian approach to cumulative logistic regression model for the ordinal response based on the orthogonal principal components via singular value decomposition considering the multicollinearity among predictors. The advantage of the suggested method is considering dimension reduction and parameter estimation simultaneously. To evaluate the performance of the proposed model we conduct a simulation study with considering a high-dimensional and highly correlated explanatory matrix. Also, we fit the suggested method to a real data concerning sprout- and scab-damaged kernels of wheat and compare it to EM based proportional-odds logistic regression model. Compared to EM based methods, we argue that the proposed model works better for the highly correlated high-dimensional data with providing parameter estimates and provides good predictions.

Principal Components Logistic Regression based on Robust Estimation (로버스트추정에 바탕을 둔 주성분로지스틱회귀)

  • Kim, Bu-Yong;Kahng, Myung-Wook;Jang, Hea-Won
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.3
    • /
    • pp.531-539
    • /
    • 2009
  • Logistic regression is widely used as a datamining technique for the customer relationship management. The maximum likelihood estimator has highly inflated variance when multicollinearity exists among the regressors, and it is not robust against outliers. Thus we propose the robust principal components logistic regression to deal with both multicollinearity and outlier problem. A procedure is suggested for the selection of principal components, which is based on the condition index. When a condition index is larger than the cutoff value obtained from the model constructed on the basis of the conjoint analysis, the corresponding principal component is removed from the logistic model. In addition, we employ an algorithm for the robust estimation, which strives to dampen the effect of outliers by applying the appropriate weights and factors to the leverage points and vertical outliers identified by the V-mask type criterion. The Monte Carlo simulation results indicate that the proposed procedure yields higher rate of correct classification than the existing method.

Algorithm for Finding the Best Principal Component Regression Models for Quantitative Analysis using NIR Spectra (근적외 스펙트럼을 이용한 정량분석용 최적 주성분회귀모델을 얻기 위한 알고리듬)

  • Cho, Jung-Hwan
    • Journal of Pharmaceutical Investigation
    • /
    • v.37 no.6
    • /
    • pp.377-395
    • /
    • 2007
  • Near infrared(NIR) spectral data have been used for the noninvasive analysis of various biological samples. Nonetheless, absorption bands of NIR region are overlapped extensively. It is very difficult to select the proper wavelengths of spectral data, which give the best PCR(principal component regression) models for the analysis of constituents of biological samples. The NIR data were used after polynomial smoothing and differentiation of 1st order, using Savitzky-Golay filters. To find the best PCR models, all-possible combinations of available principal components from the given NIR spectral data were derived by in-house programs written in MATLAB codes. All of the extensively generated PCR models were compared in terms of SEC(standard error of calibration), $R^2$, SEP(standard error of prediction) and SECP(standard error of calibration and prediction) to find the best combination of principal components of the initial PCR models. The initial PCR models were found by SEC or Malinowski's indicator function and a priori selection of spectral points were examined in terms of correlation coefficients between NIR data at each wavelength and corresponding concentrations. For the test of the developed program, aqueous solutions of BSA(bovine serum albumin) and glucose were prepared and analyzed. As a result, the best PCR models were found using a priori selection of spectral points and the final model selection by SEP or SECP.

Predicting Korea Pro-Baseball Rankings by Principal Component Regression Analysis (주성분회귀분석을 이용한 한국프로야구 순위)

  • Bae, Jae-Young;Lee, Jin-Mok;Lee, Jea-Young
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.3
    • /
    • pp.367-379
    • /
    • 2012
  • In baseball rankings, prediction has been a subject of interest for baseball fans. To predict these rankings, (based on 2011 data from Korea Professional Baseball records) the arithmetic mean method, the weighted average method, principal component analysis, and principal component regression analysis is presented. By standardizing the arithmetic average, the correlation coefficient using the weighted average method, using principal components analysis to predict rankings, the final model was selected as a principal component regression model. By practicing regression analysis with a reduced variable by principal component analysis, we propose a rank predictability model of a pitcher part, a batter part and a pitcher batter part. We can estimate a 2011 rank of pro-baseball by a predicted regression model. By principal component regression analysis, the pitcher part, the other part, the pitcher and the batter part of the ranking prediction model is proposed. The regression model predicts the rankings for 2012.

Supervised Learning-Based Collaborative Filtering Using Market Basket Data for the Cold-Start Problem

  • Hwang, Wook-Yeon;Jun, Chi-Hyuck
    • Industrial Engineering and Management Systems
    • /
    • v.13 no.4
    • /
    • pp.421-431
    • /
    • 2014
  • The market basket data in the form of a binary user-item matrix or a binary item-user matrix can be modelled as a binary classification problem. The binary logistic regression approach tackles the binary classification problem, where principal components are predictor variables. If users or items are sparse in the training data, the binary classification problem can be considered as a cold-start problem. The binary logistic regression approach may not function appropriately if the principal components are inefficient for the cold-start problem. Assuming that the market basket data can also be considered as a special regression problem whose response is either 0 or 1, we propose three supervised learning approaches: random forest regression, random forest classification, and elastic net to tackle the cold-start problem, comparing the performance in a variety of experimental settings. The experimental results show that the proposed supervised learning approaches outperform the conventional approaches.

Deletion diagnostics in fitting a given regression model to a new observation

  • Kim, Myung Geun
    • Communications for Statistical Applications and Methods
    • /
    • v.23 no.3
    • /
    • pp.231-239
    • /
    • 2016
  • A graphical diagnostic method based on multiple case deletions in a regression context is introduced by using the sampling distribution of the difference between two least squares estimators with and without multiple cases. Principal components analysis plays a key role in deriving this diagnostic method. Multiple case deletions of test statistic are also considered when a new observation is fitted to a given regression model. The result is useful for detecting influential observations in econometric data analysis, for example in checking whether the consumption pattern at a later time is the same as the one found before or not, as well as for investigating the influence of cases in the usual regression model. An illustrative example is given.

An Empirical Study on the Activation Approach for the Competitive Power of Korean Shipping Company in the Korea-China Liner Routes (국적선사의 경쟁력 강화를 위한 한중정기항로 활성화 방안에 대한 실증연구)

  • Lee, Yong-Ho
    • Journal of Navigation and Port Research
    • /
    • v.27 no.2
    • /
    • pp.163-170
    • /
    • 2003
  • This empirical study takes the activation approach for the competitive power of Korean shipping companies in the Korea-China liner routes. Data for this study were collected from Korea/ China/ 3rd flag shipping companies through the 500 questionnaires. The data of 250 respondents were analyzed statistically to verify the hypotheses and to induce Regression Equation which could predicts the influencing level of the determinants to competitive advantage for Korean shipping companies on Korea-China Liner Shipping Routes. Factor Analysis/ Cronbach's Alpha/ Principal Analysis/ Multiple Regression Analysis were used in order to test the hypotheses for the empirical study.

Quantitative Analysis for Biomass Energy Problem Using a Radial Basis Function Neural Network (RBF 뉴럴네트워크를 사용한 바이오매스 에너지문제의 계량적 분석)

  • Baek, Seung Hyun;Hwang, Seung-June
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.36 no.4
    • /
    • pp.59-63
    • /
    • 2013
  • In biomass gasification, efficiency of energy quantification is a difficult part without finishing the process. In this article, a radial basis function neural network (RBFN) is proposed to predict biomass efficiency before gasification. RBFN will be compared with a principal component regression (PCR) and a multilayer perceptron neural network (MLPN). Due to the high dimensionality of data, principal component transform is first used in PCR and afterwards, ordinary regression is applied to selected principal components for modeling. Multilayer perceptron neural network (MLPN) is also used without any preprocessing. For this research, 3 wood samples and 3 other feedstock are used and they are near infrared (NIR) spectrum data with high-dimensionality. Ash and char are used as response variables. The comparison results of two responses will be shown.