• 제목/요약/키워드: principal component regression

검색결과 251건 처리시간 0.028초

식생이 무성한 지역에서의 Principal Component Analysis 에 의한 Landsat TM 자료의 광역지질도 작성 (Regional Geological Mapping by Principal Component Analysis of the Landsat TM Data in a Heavily Vegetated Area)

  • 朴鍾南;徐延熙
    • 대한원격탐사학회지
    • /
    • 제4권1호
    • /
    • pp.49-60
    • /
    • 1988
  • Principal Component Analysis (PCA) was applied for regional geological mapping to a multivariate data set of the Landsat TM data in the heavily vegetated and topographically rugged Chungju area. The multivariate data set selection was made by statistical analysis based on the magnitude of regression of squares in multiple regression, and it includes R1/2/R3/4, R2/3, R5/7/R4/3, R1/2, R3/4. R4/3. AND R4/5. As a result of application of PCA, some of later principal components (in this study PC 3 and PC 5) are geologically more significant than earlier major components, PC 1 and PC 2 herein. The earlier two major components which comprise 96% of the total information of the data set, mainly represent reflectance of vegetation and topographic effects, while though the rest represent 3% of the total information which statistically indicates the information unstable, geological significance of PC3 and PC5 in the study implies that application of the technique in more favorable areas should lead to much better results.

Water Demand Forecasting by Characteristics of City Using Principal Component and Cluster Analyses

  • Choi, Tae-Ho;Kwon, O-Eun;Koo, Ja-Yong
    • Environmental Engineering Research
    • /
    • 제15권3호
    • /
    • pp.135-140
    • /
    • 2010
  • With the various urban characteristics of each city, the existing water demand prediction, which uses average liter per capita day, cannot be used to achieve an accurate prediction as it fails to consider several variables. Thus, this study considered social and industrial factors of 164 local cities, in addition to population and other directly influential factors, and used main substance and cluster analyses to develop a more efficient water demand prediction model that considers unique localities of each city. After clustering, a multiple regression model was developed that proved that the $R^2$ value of the inclusive multiple regression model was 0.59; whereas, those of Clusters A and B were 0.62 and 0.74, respectively. Thus, the multiple regression model was considered more reasonable and valid than the inclusive multiple regression model. In summary, the water demand prediction model using principal component and cluster analyses as the standards to classify localities has a better modification coefficient than that of the inclusive multiple regression model, which does not consider localities.

Logistic Regression Classification by Principal Component Selection

  • Kim, Kiho;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • 제21권1호
    • /
    • pp.61-68
    • /
    • 2014
  • We propose binary classification methods by modifying logistic regression classification. We use variable selection procedures instead of original variables to select the principal components. We describe the resulting classifiers and discuss their properties. The performance of our proposals are illustrated numerically and compared with other existing classification methods using synthetic and real datasets.

주성분회귀(主成分回歸)에서의 민감도분석(敏感度分析) : 수치적(數値的) 연구(硏究) (Sensitivity Analysis in Principal Component Regression : Numerical Investigation)

  • 신재경
    • Journal of the Korean Data and Information Science Society
    • /
    • 제2권
    • /
    • pp.1-9
    • /
    • 1991
  • Shin, Tarumi and Tanaka(1989) discussed a method of sensitivity analysis in principal component regression(PCR) based on an influence function derived by Tanaka(1988). The present paper is its continuation. In this paper we first consider two new influence measures, then apply the proposed method to various data sets and discuss some properties of sensitivity analysis in PCR.

  • PDF

페널티 방법을 이용한 주성분분석 연구 (A study on principal component analysis using penalty method)

  • 박철용
    • Journal of the Korean Data and Information Science Society
    • /
    • 제28권4호
    • /
    • pp.721-731
    • /
    • 2017
  • 이 연구에서는 Lasso 페널티 방법을 이용한 주성분분석 방법을 소개한다. 주성분분석에 Lasso 페널티를 적용하는 방법으로 흔히 사용되는 방법은 크게 두 가지가 있다. 첫 번째 방법은 주성분을 반응변수로 놓고 원 자료행렬을 설명변수로 하는 회귀분석의 회귀계수를 이용하여 최적의 선형결 합 벡터를 구할 때 Lasso 페널티 (일반적으로 elastic net 페널티)를 부과하는 방법이다. 두 번째 방법은 원자료행렬을 비정칙값 분해로 근사하고 남은 잔차행렬에 Lasso 페널티를 부과하여 최적의 선형결합 벡터를 구하는 방법이다. 이 연구에서는 주성분 분석에 Lasso 페널티를 부과하는 이 두 가지 방법들을 자세하게 개관하는데, 이 방법들은 변수 숫자가 표본크기보다 큰 경우에도 적용가능한 장점이 있다. 또한 실제 자료분석에서 R 프로그램을 통해 두 방법을 적용하고 그 결과를 비교한다. 구체적으로 변수 숫자가 표본크기보다 큰 Ahamad (1967)의 crime 자료에 적용한다.

Model-based inverse regression for mixture data

  • Choi, Changhwan;Park, Chongsun
    • Communications for Statistical Applications and Methods
    • /
    • 제24권1호
    • /
    • pp.97-113
    • /
    • 2017
  • This paper proposes a method for sufficient dimension reduction (SDR) of mixture data. We consider mixture data containing more than one component that have distinct central subspaces. We adopt an approach of a model-based sliced inverse regression (MSIR) to the mixture data in a simple and intuitive manner. We employed mixture probabilistic principal component analysis (MPPCA) to estimate each central subspaces and cluster the data points. The results from simulation studies and a real data set show that our method is satisfactory to catch appropriate central spaces and is also robust regardless of the number of slices chosen. Discussions about root selection, estimation accuracy, and classification with initial value issues of MPPCA and its related simulation results are also provided.

회귀분석에 의한 TOC 농도 추정 - 오수천 유역을 대상으로 - (Application of Regression Analysis Model to TOC Concentration Estimation - Osu Stream Watershed -)

  • 박진환;문명진;한성욱;이형진;정수정;황경섭;김갑순
    • 환경영향평가
    • /
    • 제23권3호
    • /
    • pp.187-196
    • /
    • 2014
  • The objective of this study is to evaluate and analyze Osu stream watershed water environment system. The data were collected from January 2009 to December 2011 including water temperature, pH, DO, EC, BOD, COD, TOC, SS, T-N, T-P and discharge. The data were used for principle component analysis and factor analysis. The results are as followes. The primary factors obtained from both the principal component analysis and the factor analysis were BOD, COD, TOC, SS and T-P. Once principal component analysis and factor analysis have been performed with the collected data and then the results will be applied to both simple regression model and multiple regression model. The regression model was developed into case 1 using concentrations of water quality parameters and case 2 using delivery loads. The value of the coefficient of determination on case 1 fell between 0.629 and 0.866; this was lower than case 2 value which fell between 0.946 and 0.998. Therefore, case 2 model would be a reliable choice.The coefficient of determination between the estimated figure using data which was developed to the regression model in 2012 and the actual measurement value was over 0.6, overall. It can be safely deduced that the correlation value between the two findings was high. The same model can be applied to get TOC concentrations in future.

Principal Component and Multiple Regression Analysis for Steel Fiber Reinforced Concrete (SFRC) Beams

  • Islam, Mohammad S.;Alam, Shahria
    • International Journal of Concrete Structures and Materials
    • /
    • 제7권4호
    • /
    • pp.303-317
    • /
    • 2013
  • This study evaluates the shear strength of steel fiber reinforced concrete (SFRC) beams from a database, which consists of extensive experimental results of 222 SFRC beams having no stirrups. In order to predict the analytical shear strength of the SFRC beams more precisely, the selected beams were sorted into six different groups based on their ultimate concrete strength (low strength with $f_c^{\prime}$ <50 MPa and high strength with $f_c^{\prime}$ <50 MPa), span-depth ratio (shallow beam with $a/d{\geq}2.5 $and deep beam with a/d<2.5) and steel fiber shape (plain, crimped and hooked). Principal component and multiple regression analyses were performed to determine the most feasible model in predicting the shear strength of SFRC beams. A variety of statistical analyses were conducted, and compared with those of the existing equations in estimating the shear strength of SFRC beams. The results showed that the recommended empirical equations were best suited to assess the shear strength of SFRC beams more accurately as compared to those obtained by the previously developed models.

로버스트주성분회귀에서 최적의 주성분선정을 위한 기준 (A Criterion for the Selection of Principal Components in the Robust Principal Component Regression)

  • 김부용
    • Communications for Statistical Applications and Methods
    • /
    • 제18권6호
    • /
    • pp.761-770
    • /
    • 2011
  • 회귀모형에 연관성이 높은 설명변수들이 포함되면 다중공선성의 문제가 야기되며, 동시에 자료에 회귀 이상점들이 포함되면 최소자승추정량에 바탕을 둔 제반 통계적 추론은 심각한 결함을 갖게 된다. 이러한 현상들은 데이터마이닝 분야에서 많이 볼 수 있는데, 본 논문에서는 두 가지 문제를 동시에 해결하기 위한 방안으로서 로버스트주성분회귀를 제안하였다. 특히 최적의 주성분을 선정하기 위한 새로운 기준을 개발하였는데, 설명변수들의 표본공분산 대신에 MVE-추정량을 기반으로 하였으며, 고유치가 아니라 상태지수의 크기에 바탕을 둔 선정기준을 제안하였다. 그리고 주성분모형에서의 추정을 위하여 회귀이상점에 대해 로버스트한 LTS-추정을 도입하였다. 제안된 선정기준이 기존의 기준들보다 다중공선성과 이상점이 유발하는 문제들을 잘 해결할 수 있음을 모의실험을 통하여 확인하였다.

로지스틱모형에서의 주성분회귀 (Principal Components Regression in Logistic Model)

  • 김부용;강명욱
    • 응용통계연구
    • /
    • 제21권4호
    • /
    • pp.571-580
    • /
    • 2008
  • 로지스틱회귀분석은 고객관계관리나 신용위험관리 등의 분야에서 많이 사용되는 기법인데, 이러한 분야에서의 로지스틱회귀모형에는 연관성이 높은 설명변수들이 다수 포함되어 다중공선성의 문제를 유발하는 경우가 있다. 다중공선성이 존재하는 상황에서 최우추정량은 심각한 결함을 갖는다는 사실은 잘 알려졌다. 이 문제를 해결하기 위하여 로지스틱주성분회귀를 연구하되, 분석상의 주요 과정인 주성분 선정을 위한 방법을 새롭게 제안하였다. 추정량의 분산을 최소가 되게 하는 상태지수 값을 측정하고, 이 값에 영향을 미치는 주요 요인들을 컨조인트분석에 의해 파악하여 주성분 선정기준을 결정하는 모형을 구축하였다. 제안된 방법은 다중공선성 문제를 적절히 해결하면서도 모형의 적합성을 향상시킨다는 사실이 모의실험을 통하여 확인되었다.