• Title/Summary/Keyword: principal component regression

Search Result 253, Processing Time 0.019 seconds

Intensive comparison of semi-parametric and non-parametric dimension reduction methods in forward regression

  • Shin, Minju;Yoo, Jae Keun
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.5
    • /
    • pp.615-627
    • /
    • 2022
  • Principal Fitted Component (PFC) is a semi-parametric sufficient dimension reduction (SDR) method, which is originally proposed in Cook (2007). According to Cook (2007), the PFC has a connection with other usual non-parametric SDR methods. The connection is limited to sliced inverse regression (Li, 1991) and ordinary least squares. Since there is no direct comparison between the two approaches in various forward regressions up to date, a practical guidance between the two approaches is necessary for usual statistical practitioners. To fill this practical necessity, in this paper, we newly derive a connection of the PFC to covariance methods (Yin and Cook, 2002), which is one of the most popular SDR methods. Also, intensive numerical studies have done closely to examine and compare the estimation performances of the semi- and non-parametric SDR methods for various forward regressions. The founding from the numerical studies are confirmed in a real data example.

Interpretation and Comparison of High PM2.5 Characteristics in Seoul and Busan based on the PCA/MLR Statistics from Two Level Meteorological Observations (두 층 관측 기상인자의 주성분-다중회귀분석으로 도출되는 고농도 미세먼지의 부산-서울 지역차이 해석)

  • Choi, Daniel;Chang, Lim-Seok;Kim, Cheol-Hee
    • Atmosphere
    • /
    • v.31 no.1
    • /
    • pp.29-43
    • /
    • 2021
  • In this study, two-step statistical approach including Principal Component Analysis (PCA) and Multiple Linear Regression (MLR) was employed, and main meteorological factors explaining the high-PM2.5 episodes were identified in two regions: Seoul and Busan. We first performed PCA to isolate the Principal Component (PC) that is linear combination of the meteorological variables observed at two levels: surface and 850 hPa level. The employed variables at surface are: temperature (T2m), wind speed, sea level pressure, south-north and west-east wind component and those at 850 hPa upper level variables are: south-north (v850) and west-east (u850) wind component and vertical stability. Secondly we carried out MLR analysis and verified the relationships between PM2.5 daily mean concentration and meteorological PCs. Our two-step statistical approach revealed that in Seoul, dominant factors for influencing the high PM2.5 days are mainly composed of upper wind characteristics in winter including positive u850 and negative v850, indicating that continental (or Siberian) anticyclone had a strong influence. In Busan, however, the dominant factors in explanaining in high PM2.5 concentrations were associated with high T2m and negative u850 in summer. This is suggesting that marine anticyclone had a considerable effect on Busan's high PM2.5 with high temperature which is relevant to the vigorous photochemical secondary generation. Our results of both differences and similarities between two regions derived from only statistical approaches imply the high-PM2.5 episodes in Korea show their own unique characteristics and seasonality which are mostly explainable by two layer (surface and upper) mesoscale meteorological variables.

Multi-currencies portfolio strategy using principal component analysis and logistic regression (주성분 분석과 로지스틱 회귀분석을 이용한 다국 통화포트폴리오 전략)

  • Shim, Kyung-Sik;Ahn, Jae-Joon;Oh, Kyong-Joo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.1
    • /
    • pp.151-159
    • /
    • 2012
  • This paper proposes to develop multi-currencies portfolio strategy using principal component analysis (PCA) and logistic regression (LR) in foreign exchange market. While there is a great deal of literature about the analysis of exchange market, there is relatively little work on developing trading strategies in foreign exchange markets. There are two objectives in this paper. The first objective is to suggest portfolio allocation method by applying PCA. The other objective is to determine market timing which is the strategy of making buy or sell decision using LR. The results of this study show that proposed model is useful trading strategy in foreign exchange market and can be desirable solution which gives lots of investors an important investment information.

Development of Virtual Metrology Models in Semiconductor Manufacturing Using Genetic Algorithm and Kernel Partial Least Squares Regression (유전알고리즘과 커널 부분최소제곱회귀를 이용한 반도체 공정의 가상계측 모델 개발)

  • Kim, Bo-Keon;Yum, Bong-Jin
    • IE interfaces
    • /
    • v.23 no.3
    • /
    • pp.229-238
    • /
    • 2010
  • Virtual metrology (VM), a critical component of semiconductor manufacturing, is an efficient way of assessing the quality of wafers not actually measured. This is done based on a model between equipment sensor data (obtained for all wafers) and the quality characteristics of wafers actually measured. This paper considers principal component regression (PCR), partial least squares regression (PLSR), kernel PCR (KPCR), and kernel PLSR (KPLSR) as VM models. For each regression model, two cases are considered. One utilizes all explanatory variables in developing a model, and the other selects significant variables using the genetic algorithm (GA). The prediction performances of 8 regression models are compared for the short- and long-term etch process data. It is found among others that the GA-KPLSR model performs best for both types of data. Especially, its prediction ability is within the requirement for the short-term data implying that it can be used to implement VM for real etch processes.

Pre-processing and Bias Correction for AMSU-A Radiance Data Based on Statistical Methods (통계적 방법에 근거한 AMSU-A 복사자료의 전처리 및 편향보정)

  • Lee, Sihye;Kim, Sangil;Chun, Hyoung-Wook;Kim, Ju-Hye;Kang, Jeon-Ho
    • Atmosphere
    • /
    • v.24 no.4
    • /
    • pp.491-502
    • /
    • 2014
  • As a part of the KIAPS (Korea Institute of Atmospheric Prediction Systems) Package for Observation Processing (KPOP), we have developed the modules for Advanced Microwave Sounding Unit-A (AMSU-A) pre-processing and its bias correction. The KPOP system calculates the airmass bias correction coefficients via the method of multiple linear regression in which the scan-corrected innovation and the thicknesses of 850~300, 200~50, 50~5, and 10~1 hPa are respectively used for dependent and independent variables. Among the four airmass predictors, the multicollinearity has been shown by the Variance Inflation Factor (VIF) that quantifies the severity of multicollinearity in a least square regression. To resolve the multicollinearity, we adopted simple linear regression and Principal Component Regression (PCR) to calculate the airmass bias correction coefficients and compared the results with those from the multiple linear regression. The analysis shows that the order of performances is multiple linear, principal component, and simple linear regressions. For bias correction for the AMSU-A channel 4 which is the most sensitive to the lower troposphere, the multiple linear regression with all four airmass predictors is superior to the simple linear regression with one airmass predictor of 850~300 hPa. The results of PCR with 95% accumulated variances accounted for eigenvalues showed the similar results of the multiple linear regression.

PRINCIPAL COMPONENTS BASED SUPPORT VECTOR REGRESSION MODEL FOR ON-LINE INSTRUMENT CALIBRATION MONITORING IN NPPS

  • Seo, In-Yong;Ha, Bok-Nam;Lee, Sung-Woo;Shin, Chang-Hoon;Kim, Seong-Jun
    • Nuclear Engineering and Technology
    • /
    • v.42 no.2
    • /
    • pp.219-230
    • /
    • 2010
  • In nuclear power plants (NPPs), periodic sensor calibrations are required to assure that sensors are operating correctly. By checking the sensor's operating status at every fuel outage, faulty sensors may remain undetected for periods of up to 24 months. Moreover, typically, only a few faulty sensors are found to be calibrated. For the safe operation of NPP and the reduction of unnecessary calibration, on-line instrument calibration monitoring is needed. In this study, principal component-based auto-associative support vector regression (PCSVR) using response surface methodology (RSM) is proposed for the sensor signal validation of NPPs. This paper describes the design of a PCSVR-based sensor validation system for a power generation system. RSM is employed to determine the optimal values of SVR hyperparameters and is compared to the genetic algorithm (GA). The proposed PCSVR model is confirmed with the actual plant data of Kori Nuclear Power Plant Unit 3 and is compared with the Auto-Associative support vector regression (AASVR) and the auto-associative neural network (AANN) model. The auto-sensitivity of AASVR is improved by around six times by using a PCA, resulting in good detection of sensor drift. Compared to AANN, accuracy and cross-sensitivity are better while the auto-sensitivity is almost the same. Meanwhile, the proposed RSM for the optimization of the PCSVR algorithm performs even better in terms of accuracy, auto-sensitivity, and averaged maximum error, except in averaged RMS error, and this method is much more time efficient compared to the conventional GA method.

Mean Annual Precipitation Estimatis of Korea (한국년평균 강수량의 추정)

  • Kim, Seung;Kim, Gyu-Ho
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 1989.07a
    • /
    • pp.5-16
    • /
    • 1989
  • This Study estimates the mean annual pricipitation of Korea. Precipitation data observed at the 60 Korea Meteorological Service stations during 84-year period(1905-1988) are used. Missing or unobserved values are estimated using regression analysis with principal componets, and the annual precipitation means obtatined by arithmetic, Thiessen and isobyetal methods are compared.

  • PDF

A Study on the Factor Analysis of the Encounter Data in the Maritime Traffic Environment (해상교통 조우데이터 요인분석에 관한 연구)

  • Kim, Kwang-Il;Jeong, Jung Sik;Park, Gyei-Kark
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.25 no.3
    • /
    • pp.293-298
    • /
    • 2015
  • The vessel encounter data collected from the vessel trajectories in the maritime traffic situation is possible to analyze vessel collision and near-collision risk using statistical method. In this study, analyzing variables extracted from the vessel encounter data using factor analysis, we determine main factors effecting vessel collision risk from vessel encounter data. In order to calculate each factor, it used principal component analysis for factor analysis after normalization and standardization of vessel encounter variables. As a result of the factor analysis, main effect factors are summarized into the vessel approach factor and collision avoidance variance factor.

Functional Data Analysis of Temperature and Precipitation Data (기온 강수량 자료의 함수적 데이터 분석)

  • Kang, Kee-Hoon;Ahn, Hong-Se
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.3
    • /
    • pp.431-445
    • /
    • 2006
  • In this paper we review some methods for analyzing functional data and illustrate real application of functional data analysis. Representing methods for functional data by using basis function, analyzing functional variation by functional principal component analysis and functional linear models are reviewed. For a real application, we use temperature and precipitation data measured in Korea from the January of 1970 to the May of 2004. We apply functional principal component analysis for each data and test the significance of regional division done by using shining hours. We also estimate functional regression model for temperature and precipitation.

Principal Components Logistic Regression based on Robust Estimation (로버스트추정에 바탕을 둔 주성분로지스틱회귀)

  • Kim, Bu-Yong;Kahng, Myung-Wook;Jang, Hea-Won
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.3
    • /
    • pp.531-539
    • /
    • 2009
  • Logistic regression is widely used as a datamining technique for the customer relationship management. The maximum likelihood estimator has highly inflated variance when multicollinearity exists among the regressors, and it is not robust against outliers. Thus we propose the robust principal components logistic regression to deal with both multicollinearity and outlier problem. A procedure is suggested for the selection of principal components, which is based on the condition index. When a condition index is larger than the cutoff value obtained from the model constructed on the basis of the conjoint analysis, the corresponding principal component is removed from the logistic model. In addition, we employ an algorithm for the robust estimation, which strives to dampen the effect of outliers by applying the appropriate weights and factors to the leverage points and vertical outliers identified by the V-mask type criterion. The Monte Carlo simulation results indicate that the proposed procedure yields higher rate of correct classification than the existing method.