• 제목/요약/키워드: principal component regression model

검색결과 106건 처리시간 0.023초

주성분회귀분석을 이용한 한국프로야구 순위 (Predicting Korea Pro-Baseball Rankings by Principal Component Regression Analysis)

  • 배재영;이진목;이제영
    • Communications for Statistical Applications and Methods
    • /
    • 제19권3호
    • /
    • pp.367-379
    • /
    • 2012
  • 야구경기에서 순위를 예측하는 것은 야구팬들에게 관심의 대상이 된다. 이러한 순위를 예측하기 위해서 2011년 한국프로야구 기록 자료를 바탕으로 산술평균방법, 가중평균방법, 주성분분석방법, 주성분회귀분석 방법을 제시한다. 표준화를 통한 산술평균, 상관계수를 이용한 가중평균과 주성분 분석을 이용해서 순위를 예측하고, 최종모형으로 주성분회귀분석 모형이 선택되었다. 주성분 분석으로 축약된 변수를 이용해서 회귀분석을 실시하여, 투수부분, 타자부분, 투수와 타자부분의 순위예측 모형을 제안한다. 예측된 회귀모형을 통해서 2012년도 순위 예측이 가능하다.

라소를 이용한 간편한 주성분분석 (Simple principal component analysis using Lasso)

  • 박철용
    • Journal of the Korean Data and Information Science Society
    • /
    • 제24권3호
    • /
    • pp.533-541
    • /
    • 2013
  • 이 연구에서는 라소를 이용한 간편한 주성분분석을 제안한다. 이 방법은 다음의 두 단계로 구성되어 있다. 먼저 주성분분석에 의해 주성분을 구한다. 다음으로 각 주성분을 반응변수로 하고 원자료를 설명변수로 하는 라소 회귀모형에 의한 회귀계수 추정량을 구한다. 이 회귀계수 추정량에 기반한 새로운 주성분을 사용한다. 이 방법은 라소 회귀분석의 성질에 의해 회귀계수 추정량이 보다 쉽게 0이 될 수 있기 때문에 해석이 쉬운 장점이 있다. 왜냐하면 주성분을 반응변수로 하고 원자료를 설명변수로 하는 회귀모형의 회귀계수가 고유벡터가 되기 때문이다. 라소 회귀모형을 위한 R 패키지를 이용하여 모의생성된 자료와 실제 자료에 이 방법을 적용하여 유용성을 보였다.

주성분 회귀모형을 이용한 과학기술 지식생산함수 추정 (Estimation of S&T Knowledge Production Function Using Principal Component Regression Model)

  • 박수동;성웅현
    • 기술혁신학회지
    • /
    • 제13권2호
    • /
    • pp.231-251
    • /
    • 2010
  • 과학기술 R&D 활동의 대표적 성과인 SCI 논문과 특허의 생산에 영향을 미치는 요인은 연구비, 연구원수, 지식스톡(R&D스톡, 논문스톡, 특허스톡 등), 연구환경, 개방화 정도, 인적자본, GDP 등 다양하다. 일반적인 회귀모형을 이용하여 논문 또는 특허의 생산에 영향을 미치는 요인을 추정하면 생산요인들 간에 다중공선성 문제가 발생하여 추정의 오류가 발생한다. 본 논문에서는 과학기술 지식생산에 영향을 미치는 요인들 간의 다중공선성 문제를 해결하기 위해 주성분 회귀모형을 이용하였다. SCI 논문을 산출로 가정한 과학생산성과와 특허를 산출로 가정한 기술생산성과에 영향을 미치는 요인을 회귀모형과 주성분 회귀모형을 이용하여 3가지 사례를 대상으로 비교 분석하였다. 일반 회귀모형을 이용하여 SCI 논문과 특허의 생산에 영향을 미치는 요인들을 분석한 결과, 요인들간에 다중공선성이 매우 높게 나타났고, 그 결과 회귀계수와 추정과 검정에 오류가 발생되었다. 반면 주성분 회귀모형을 이용하여 분석한 결과 다중공선성문제가 해결되어, 개별 생산요인에 대한 효과를 적절하게 추정할 수 있었다. 본 논문에서 제안한 주성분 회귀모형을 이용한 과학기술 지식생산함수 추정방법은 다중공선성이 강한 소수의 생산요소를 포함한 회귀분석에서 유용하게 적용될 수 있을 것이다.

  • PDF

Bayesian Typhoon Track Prediction Using Wind Vector Data

  • Han, Minkyu;Lee, Jaeyong
    • Communications for Statistical Applications and Methods
    • /
    • 제22권3호
    • /
    • pp.241-253
    • /
    • 2015
  • In this paper we predict the track of typhoons using a Bayesian principal component regression model based on wind field data. Data is obtained at each time point and we applied the Bayesian principal component regression model to conduct the track prediction based on the time point. Based on regression model, we applied to variable selection prior and two kinds of prior distribution; normal and Laplace distribution. We show prediction results based on Bayesian Model Averaging (BMA) estimator and Median Probability Model (MPM) estimator. We analysis 8 typhoons in 2006 using data obtained from previous 6 years (2000-2005). We compare our prediction results with a moving-nest typhoon model (MTM) proposed by the Korea Meteorological Administration. We posit that is possible to predict the track of a typhoon accurately using only a statistical model and without a dynamical model.

Water Demand Forecasting by Characteristics of City Using Principal Component and Cluster Analyses

  • Choi, Tae-Ho;Kwon, O-Eun;Koo, Ja-Yong
    • Environmental Engineering Research
    • /
    • 제15권3호
    • /
    • pp.135-140
    • /
    • 2010
  • With the various urban characteristics of each city, the existing water demand prediction, which uses average liter per capita day, cannot be used to achieve an accurate prediction as it fails to consider several variables. Thus, this study considered social and industrial factors of 164 local cities, in addition to population and other directly influential factors, and used main substance and cluster analyses to develop a more efficient water demand prediction model that considers unique localities of each city. After clustering, a multiple regression model was developed that proved that the $R^2$ value of the inclusive multiple regression model was 0.59; whereas, those of Clusters A and B were 0.62 and 0.74, respectively. Thus, the multiple regression model was considered more reasonable and valid than the inclusive multiple regression model. In summary, the water demand prediction model using principal component and cluster analyses as the standards to classify localities has a better modification coefficient than that of the inclusive multiple regression model, which does not consider localities.

회귀분석에 의한 TOC 농도 추정 - 오수천 유역을 대상으로 - (Application of Regression Analysis Model to TOC Concentration Estimation - Osu Stream Watershed -)

  • 박진환;문명진;한성욱;이형진;정수정;황경섭;김갑순
    • 환경영향평가
    • /
    • 제23권3호
    • /
    • pp.187-196
    • /
    • 2014
  • The objective of this study is to evaluate and analyze Osu stream watershed water environment system. The data were collected from January 2009 to December 2011 including water temperature, pH, DO, EC, BOD, COD, TOC, SS, T-N, T-P and discharge. The data were used for principle component analysis and factor analysis. The results are as followes. The primary factors obtained from both the principal component analysis and the factor analysis were BOD, COD, TOC, SS and T-P. Once principal component analysis and factor analysis have been performed with the collected data and then the results will be applied to both simple regression model and multiple regression model. The regression model was developed into case 1 using concentrations of water quality parameters and case 2 using delivery loads. The value of the coefficient of determination on case 1 fell between 0.629 and 0.866; this was lower than case 2 value which fell between 0.946 and 0.998. Therefore, case 2 model would be a reliable choice.The coefficient of determination between the estimated figure using data which was developed to the regression model in 2012 and the actual measurement value was over 0.6, overall. It can be safely deduced that the correlation value between the two findings was high. The same model can be applied to get TOC concentrations in future.

Model-based inverse regression for mixture data

  • Choi, Changhwan;Park, Chongsun
    • Communications for Statistical Applications and Methods
    • /
    • 제24권1호
    • /
    • pp.97-113
    • /
    • 2017
  • This paper proposes a method for sufficient dimension reduction (SDR) of mixture data. We consider mixture data containing more than one component that have distinct central subspaces. We adopt an approach of a model-based sliced inverse regression (MSIR) to the mixture data in a simple and intuitive manner. We employed mixture probabilistic principal component analysis (MPPCA) to estimate each central subspaces and cluster the data points. The results from simulation studies and a real data set show that our method is satisfactory to catch appropriate central spaces and is also robust regardless of the number of slices chosen. Discussions about root selection, estimation accuracy, and classification with initial value issues of MPPCA and its related simulation results are also provided.

주성분 분석과 다중회귀모형을 사용한 자동차 건조 공정의 히트펌프 건조기 소모 전력 분석 (Analyses of Power Consumption of the Heat Pump Dryer in the Automobile Drying Process by using the Principal Component Analysis and Multiple Regression)

  • 이창용;송근수;김진호
    • 산업경영시스템학회지
    • /
    • 제38권1호
    • /
    • pp.143-151
    • /
    • 2015
  • In this paper, we investigate how the power consumption of a heat pump dryer depends on various factors in the drying process by analyzing variables that affect the power consumption. Since there are in general many variables that affect the power consumption, for a feasible analysis, we utilize the principal component analysis to reduce the number of variables (or dimensionality) to two or three. We find that the first component is correlated positively to the entrance temperature of various devices such as compressor, expander, evaporator, and the second, negatively to condenser. We then model the power consumption as a multiple regression with two and/or three transformed variables of the selected principal components. We find that fitted value from the multiple regression explains 80~90% of the observed value of the power consumption. This results can be applied to a more elaborate control of the power consumption in the heat pump dryer.

호우피해자료에서의 고차원 자료 및 다중공선성 문제를 해소한 회귀모형 개발 (Development of Regression Models Resolving High-Dimensional Data and Multicollinearity Problem for Heavy Rain Damage Data)

  • 김정환;박지현;최창현;김형수
    • 대한토목학회논문집
    • /
    • 제38권6호
    • /
    • pp.801-808
    • /
    • 2018
  • 선형회귀모형의 학습은 일반적으로 자료의 개수가 설명변수의 개수보다 충분히 크고, 설명변수들 사이에 심각한 다중공선성이 없다는 가정 하에서 안정적으로 이루어진다. 본 연구에서는 이러한 가정이 위배되었을 경우 모형 학습의 어려움을 실제 호우피해자료를 분석함으로써 조명하였고, 이를 해결하기 위해 자료를 통합한 다음 주성분회귀모형 또는 능형회귀모형을 사용할 것을 검토하였다. 모형의 학습에 사용된 자료와 별도의 독립된 자료에서 제안된 모형들의 예측력을 평가하였고, 제안된 방법이 선형회귀모형보다 더 나은 예측력을 보이는 것을 확인하였다.

근적외 스펙트럼을 이용한 정량분석용 최적 주성분회귀모델을 얻기 위한 알고리듬 (Algorithm for Finding the Best Principal Component Regression Models for Quantitative Analysis using NIR Spectra)

  • 조정환
    • Journal of Pharmaceutical Investigation
    • /
    • 제37권6호
    • /
    • pp.377-395
    • /
    • 2007
  • Near infrared(NIR) spectral data have been used for the noninvasive analysis of various biological samples. Nonetheless, absorption bands of NIR region are overlapped extensively. It is very difficult to select the proper wavelengths of spectral data, which give the best PCR(principal component regression) models for the analysis of constituents of biological samples. The NIR data were used after polynomial smoothing and differentiation of 1st order, using Savitzky-Golay filters. To find the best PCR models, all-possible combinations of available principal components from the given NIR spectral data were derived by in-house programs written in MATLAB codes. All of the extensively generated PCR models were compared in terms of SEC(standard error of calibration), $R^2$, SEP(standard error of prediction) and SECP(standard error of calibration and prediction) to find the best combination of principal components of the initial PCR models. The initial PCR models were found by SEC or Malinowski's indicator function and a priori selection of spectral points were examined in terms of correlation coefficients between NIR data at each wavelength and corresponding concentrations. For the test of the developed program, aqueous solutions of BSA(bovine serum albumin) and glucose were prepared and analyzed. As a result, the best PCR models were found using a priori selection of spectral points and the final model selection by SEP or SECP.