• 제목/요약/키워드: principal component regression

검색결과 251건 처리시간 0.028초

Intensive comparison of semi-parametric and non-parametric dimension reduction methods in forward regression

  • Shin, Minju;Yoo, Jae Keun
    • Communications for Statistical Applications and Methods
    • /
    • 제29권5호
    • /
    • pp.615-627
    • /
    • 2022
  • Principal Fitted Component (PFC) is a semi-parametric sufficient dimension reduction (SDR) method, which is originally proposed in Cook (2007). According to Cook (2007), the PFC has a connection with other usual non-parametric SDR methods. The connection is limited to sliced inverse regression (Li, 1991) and ordinary least squares. Since there is no direct comparison between the two approaches in various forward regressions up to date, a practical guidance between the two approaches is necessary for usual statistical practitioners. To fill this practical necessity, in this paper, we newly derive a connection of the PFC to covariance methods (Yin and Cook, 2002), which is one of the most popular SDR methods. Also, intensive numerical studies have done closely to examine and compare the estimation performances of the semi- and non-parametric SDR methods for various forward regressions. The founding from the numerical studies are confirmed in a real data example.

두 층 관측 기상인자의 주성분-다중회귀분석으로 도출되는 고농도 미세먼지의 부산-서울 지역차이 해석 (Interpretation and Comparison of High PM2.5 Characteristics in Seoul and Busan based on the PCA/MLR Statistics from Two Level Meteorological Observations)

  • 최다니엘;장임석;김철희
    • 대기
    • /
    • 제31권1호
    • /
    • pp.29-43
    • /
    • 2021
  • In this study, two-step statistical approach including Principal Component Analysis (PCA) and Multiple Linear Regression (MLR) was employed, and main meteorological factors explaining the high-PM2.5 episodes were identified in two regions: Seoul and Busan. We first performed PCA to isolate the Principal Component (PC) that is linear combination of the meteorological variables observed at two levels: surface and 850 hPa level. The employed variables at surface are: temperature (T2m), wind speed, sea level pressure, south-north and west-east wind component and those at 850 hPa upper level variables are: south-north (v850) and west-east (u850) wind component and vertical stability. Secondly we carried out MLR analysis and verified the relationships between PM2.5 daily mean concentration and meteorological PCs. Our two-step statistical approach revealed that in Seoul, dominant factors for influencing the high PM2.5 days are mainly composed of upper wind characteristics in winter including positive u850 and negative v850, indicating that continental (or Siberian) anticyclone had a strong influence. In Busan, however, the dominant factors in explanaining in high PM2.5 concentrations were associated with high T2m and negative u850 in summer. This is suggesting that marine anticyclone had a considerable effect on Busan's high PM2.5 with high temperature which is relevant to the vigorous photochemical secondary generation. Our results of both differences and similarities between two regions derived from only statistical approaches imply the high-PM2.5 episodes in Korea show their own unique characteristics and seasonality which are mostly explainable by two layer (surface and upper) mesoscale meteorological variables.

주성분 분석과 로지스틱 회귀분석을 이용한 다국 통화포트폴리오 전략 (Multi-currencies portfolio strategy using principal component analysis and logistic regression)

  • 심경식;안재준;오경주
    • Journal of the Korean Data and Information Science Society
    • /
    • 제23권1호
    • /
    • pp.151-159
    • /
    • 2012
  • 본 논문에서는 외환시장에서 주성분 분석과 로지스틱 회귀분석을 이용한 다국 통화 포트폴리오 전략을 개발하는 것을 제안한다. 과거 환율시장의 분석에 대한 많은 연구가 진행되어 왔으나 상대적으로 외환시장에서의 거래 전략을 개발하는 연구는 거의 없었다. 본 연구는 크게 두 가지 목적을 가지고 있다. 첫 번째 목적은 주성분 분석을 적용시켜 포트폴리오를 구성하는 다양한 나라의 환율에 가중치 할당 방법을 제안하는 것이다. 두 번째 목적은 로지스틱 회귀분석을 이용하여 구성된 포트폴리오의 적절한 매수시점과 매도시점을 정하는 것이다. 이 논문의 실험결과는 제안한 투자전략의 유용성을 증명할 수 있을 것이며, 또한 이를 통해 시장참여자들에게 투자 결정에 있어 도움을 줄 수 있을 것이다.

유전알고리즘과 커널 부분최소제곱회귀를 이용한 반도체 공정의 가상계측 모델 개발 (Development of Virtual Metrology Models in Semiconductor Manufacturing Using Genetic Algorithm and Kernel Partial Least Squares Regression)

  • 김보건;염봉진
    • 산업공학
    • /
    • 제23권3호
    • /
    • pp.229-238
    • /
    • 2010
  • Virtual metrology (VM), a critical component of semiconductor manufacturing, is an efficient way of assessing the quality of wafers not actually measured. This is done based on a model between equipment sensor data (obtained for all wafers) and the quality characteristics of wafers actually measured. This paper considers principal component regression (PCR), partial least squares regression (PLSR), kernel PCR (KPCR), and kernel PLSR (KPLSR) as VM models. For each regression model, two cases are considered. One utilizes all explanatory variables in developing a model, and the other selects significant variables using the genetic algorithm (GA). The prediction performances of 8 regression models are compared for the short- and long-term etch process data. It is found among others that the GA-KPLSR model performs best for both types of data. Especially, its prediction ability is within the requirement for the short-term data implying that it can be used to implement VM for real etch processes.

통계적 방법에 근거한 AMSU-A 복사자료의 전처리 및 편향보정 (Pre-processing and Bias Correction for AMSU-A Radiance Data Based on Statistical Methods)

  • 이시혜;김상일;전형욱;김주혜;강전호
    • 대기
    • /
    • 제24권4호
    • /
    • pp.491-502
    • /
    • 2014
  • As a part of the KIAPS (Korea Institute of Atmospheric Prediction Systems) Package for Observation Processing (KPOP), we have developed the modules for Advanced Microwave Sounding Unit-A (AMSU-A) pre-processing and its bias correction. The KPOP system calculates the airmass bias correction coefficients via the method of multiple linear regression in which the scan-corrected innovation and the thicknesses of 850~300, 200~50, 50~5, and 10~1 hPa are respectively used for dependent and independent variables. Among the four airmass predictors, the multicollinearity has been shown by the Variance Inflation Factor (VIF) that quantifies the severity of multicollinearity in a least square regression. To resolve the multicollinearity, we adopted simple linear regression and Principal Component Regression (PCR) to calculate the airmass bias correction coefficients and compared the results with those from the multiple linear regression. The analysis shows that the order of performances is multiple linear, principal component, and simple linear regressions. For bias correction for the AMSU-A channel 4 which is the most sensitive to the lower troposphere, the multiple linear regression with all four airmass predictors is superior to the simple linear regression with one airmass predictor of 850~300 hPa. The results of PCR with 95% accumulated variances accounted for eigenvalues showed the similar results of the multiple linear regression.

PRINCIPAL COMPONENTS BASED SUPPORT VECTOR REGRESSION MODEL FOR ON-LINE INSTRUMENT CALIBRATION MONITORING IN NPPS

  • Seo, In-Yong;Ha, Bok-Nam;Lee, Sung-Woo;Shin, Chang-Hoon;Kim, Seong-Jun
    • Nuclear Engineering and Technology
    • /
    • 제42권2호
    • /
    • pp.219-230
    • /
    • 2010
  • In nuclear power plants (NPPs), periodic sensor calibrations are required to assure that sensors are operating correctly. By checking the sensor's operating status at every fuel outage, faulty sensors may remain undetected for periods of up to 24 months. Moreover, typically, only a few faulty sensors are found to be calibrated. For the safe operation of NPP and the reduction of unnecessary calibration, on-line instrument calibration monitoring is needed. In this study, principal component-based auto-associative support vector regression (PCSVR) using response surface methodology (RSM) is proposed for the sensor signal validation of NPPs. This paper describes the design of a PCSVR-based sensor validation system for a power generation system. RSM is employed to determine the optimal values of SVR hyperparameters and is compared to the genetic algorithm (GA). The proposed PCSVR model is confirmed with the actual plant data of Kori Nuclear Power Plant Unit 3 and is compared with the Auto-Associative support vector regression (AASVR) and the auto-associative neural network (AANN) model. The auto-sensitivity of AASVR is improved by around six times by using a PCA, resulting in good detection of sensor drift. Compared to AANN, accuracy and cross-sensitivity are better while the auto-sensitivity is almost the same. Meanwhile, the proposed RSM for the optimization of the PCSVR algorithm performs even better in terms of accuracy, auto-sensitivity, and averaged maximum error, except in averaged RMS error, and this method is much more time efficient compared to the conventional GA method.

한국년평균 강수량의 추정 (Mean Annual Precipitation Estimatis of Korea)

  • 김승;김규호
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 1989년도 수공학논총 제31권
    • /
    • pp.5-16
    • /
    • 1989
  • 본 연구에서는 중앙기상대 산하 60개 지점의 84년간(1985-1988) 자료에 의한 전국년평균 강수량을 추정하였다. 미계측 부분에 대하여는 Principal Component를 사용한 회귀식으로써 보와하였으며, 산술평균값, Thiessen 계수에 의한 평균값, 그리고 등우선법에 의한 평균값을 또한 비교하였다.

  • PDF

해상교통 조우데이터 요인분석에 관한 연구 (A Study on the Factor Analysis of the Encounter Data in the Maritime Traffic Environment)

  • 김광일;정중식;박계각
    • 한국지능시스템학회논문지
    • /
    • 제25권3호
    • /
    • pp.293-298
    • /
    • 2015
  • 해상교통상황에서 수집된 선박 조우(Encounter) 데이터 변수는 선박 충돌 및 근접사고(Near-Collision) 위험도를 통계적인 방법에 의한 분석이 가능하다. 본 연구에서는 선박 조우 데이터에서 추출되는 다수의 선박충돌위험도 평가 변수들을 요인분석(Factor Analysis)하여, 선박 조우데이터에서 충돌위험에 영향을 미치는 주요 요인을 결정하고자 한다. 각 요인 결정을 위해 선박조우데이터 변수 정규분포화 및 표준화를 수행한 후 주성분 분석(Principal Component Analysis)으로 요인을 결정하였다. 요인분석결과 선박 근접도 요인과 충돌회피변화요인으로 요약하였다.

기온 강수량 자료의 함수적 데이터 분석 (Functional Data Analysis of Temperature and Precipitation Data)

  • 강기훈;안홍세
    • 응용통계연구
    • /
    • 제19권3호
    • /
    • pp.431-445
    • /
    • 2006
  • 본 연구는 함수적 데이터 분석의 몇 가지 이론에 대해 소개하고 분석 기법을 실제 자료에 적용하는 내용을 다루었다. 함수적 데이터 분석의 이론적 내용으로 기저를 이용해 자료를 함수적 데이터로 표현하는 방법, 그리고 함수적 데이터의 변동성을 조사하는 주성분분석, 선형모형 등에 대해 살펴보았다. 그리고 우리나라 기온 데이터와 강수량 데이터를 대상으로 각각 함수적 데이터 분석 기법을 적용해 보았다. 또한, 기온과 강수량 데이터에 대해 함수적 회귀모형을 적합시켜 두 변수간의 함수관계를 살펴보았다.

로버스트추정에 바탕을 둔 주성분로지스틱회귀 (Principal Components Logistic Regression based on Robust Estimation)

  • 김부용;강명욱;장혜원
    • 응용통계연구
    • /
    • 제22권3호
    • /
    • pp.531-539
    • /
    • 2009
  • 로지스틱회귀분석은 고객관계관리를 위한 데이터마이닝 분야에서 많이 사용되는 기법인데, 이 분야의 모형설정 과정에서는 연관성이 매우 높은 설명변수들이 모형에 함께 포함되어 다중공선성의 문제를 유발하며, 더욱이 회귀자료에 이상점들이 포함되면 최우추정량은 심각한 결함을 갖게 된다. 두 가지 문제점을 동시에 해결하기 위하여 로버스트주성분로지스틱회귀를 적용할 수 있는데, 본 논문에서는 주성분의 선정기준을 결정하는 모형을 개발하고, 주성분모형에서의 추정치에 미치는 이상점의 영향을 축소하기 위한 로버스트추정법을 제안하였다. 제안된 추정법은 다중공선성과 이상점이 유발하는 문제들을 적절히 해결해 준다는 사실이 모의실험을 통하여 확인되었다.