• Title/Summary/Keyword: 주성분회귀법

Search Result 38, Processing Time 0.029 seconds

Performance Comparison of Data Mining Approaches for Prediction Models of Near Infrared Spectroscopy Data (근적외선 분광 데이터 예측 모형을 위한 데이터 마이닝 기법의 성능비교)

  • Baek, Seung Hyun
    • Journal of the Korea Safety Management & Science
    • /
    • v.15 no.4
    • /
    • pp.311-315
    • /
    • 2013
  • 본 논문에서는 주성분 회귀법과 부분최소자승 회귀법을 비교하여 보여준다. 이 비교의 목적은 선형형태를 보유한 근적외선 분광 데이터의 분석에 사용할 수 있는 적합한 예측 방법을 찾기 위해서이다. 두 가지 데이터 마이닝 방법론인 주성분 회귀법과 부분최소자승 회귀법이 비교되어 질 것이다. 본 논문에서는 부분최소자승 회귀법은 주성분 회귀법과 비교했을 때 약간 나은 예측능력을 가진 결과를 보여준다. 주성분 회귀법에서 50개의 주성분이 모델을 생성하기 위해서 사용지만 부분최소자승 회귀법에서는 12개의 잠재요소가 사용되었다. 평균제곱오차가 예측능력을 측정하는 도구로 사용되었다. 본 논문의 근적외선 분광데이터 분석에 따르면 부분최소자승회귀법이 선형경향을 가진 데이터의 예측에 가장 적합한 모델로 판명되었다.

Principal Components Logistic Regression based on Robust Estimation (로버스트추정에 바탕을 둔 주성분로지스틱회귀)

  • Kim, Bu-Yong;Kahng, Myung-Wook;Jang, Hea-Won
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.3
    • /
    • pp.531-539
    • /
    • 2009
  • Logistic regression is widely used as a datamining technique for the customer relationship management. The maximum likelihood estimator has highly inflated variance when multicollinearity exists among the regressors, and it is not robust against outliers. Thus we propose the robust principal components logistic regression to deal with both multicollinearity and outlier problem. A procedure is suggested for the selection of principal components, which is based on the condition index. When a condition index is larger than the cutoff value obtained from the model constructed on the basis of the conjoint analysis, the corresponding principal component is removed from the logistic model. In addition, we employ an algorithm for the robust estimation, which strives to dampen the effect of outliers by applying the appropriate weights and factors to the leverage points and vertical outliers identified by the V-mask type criterion. The Monte Carlo simulation results indicate that the proposed procedure yields higher rate of correct classification than the existing method.

A Study on Patterning and Grading by the Impact of Traffic Culture Index (교통문화지수 영향요인에 의한 유형화와 영향정도에 관한 연구)

  • Jeong Cheal-Woo;Jung Hun-Young;Ko Sang-Sean
    • Journal of Navigation and Port Research
    • /
    • v.30 no.1 s.107
    • /
    • pp.35-43
    • /
    • 2006
  • This study suggests strategies to prevent traffic accidents by utilizing impact factors per each cluster and the typical patterns of 81 cities based on the statistical analysis of the data concerning the TCI which was developed from the partnership of the Traffic Safety Authority and the Green Traffic Movement Corporation in 2002 and 2003. The Principal Component Analysis and Cluster Analysis on impact factors and TCI result in 4 components and 4 clusters. Also as the results of Stepwise Multiple Regression Analysis examining the relationship between impact factors and TCI, R2 values of these models show high to all clusters. According to the results, we suggest strategies to prevent traffic accidents per cluster concretely and it is necessary to analyze how effective the invested facilities are in reducing traffic accidents in the future.

Design of Regression Model and Pattern Classifier by Using Principal Component Analysis (주성분 분석법을 이용한 회귀다항식 기반 모델 및 패턴 분류기 설계)

  • Roh, Seok-Beom;Lee, Dong-Yoon
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.10 no.6
    • /
    • pp.594-600
    • /
    • 2017
  • The new design methodology of prediction model and pattern classification, which is based on the dimension reduction algorithm called principal component analysis, is introduced in this paper. Principal component analysis is one of dimension reduction techniques which are used to reduce the dimension of the input space and extract some good features from the original input variables. The extracted input variables are applied to the prediction model and pattern classifier as the input variables. The introduced prediction model and pattern classifier are based on the very simple regression which is the key point of the paper. The structural simplicity of the prediction model and pattern classifier leads to reducing the over-fitting problem. In order to validate the proposed prediction model and pattern classifier, several machine learning data sets are used.

Simultaneous Determination of Anionic and Nonionic Surfactants Using Multivariate Calibration Method (다변량 분석법에 의한 Anionic Surfactant와 Nonionic Surfactant의 동시정량)

  • Sang Hak Lee;Soon Nam Kwon;Bum Mok Son
    • Journal of the Korean Chemical Society
    • /
    • v.47 no.1
    • /
    • pp.19-25
    • /
    • 2003
  • A spectrophotometric method for the simultaneous determination of anionic and nonionic surfactant based on the application of multivariate calibration method such as principal component regression(PCR) and partial least squares(PLS) has been studied. The calibration models in PCR and PLS were obtained from the spectral data in the range of 400~700 nm for each standard of a calibration set of 26 standards, each containing different amounts of two surfactants. The relative standard error of prediction(RSEP$_{\alpha}$) was obtained to assess the model goodness in quantifying each analyte in a 5 validation samples which containing different amounts of two surfactants.

An Empirical Study on the Activation Approach for the Competitive Power of Korean Shipping Company in the Korea-China Liner Routes (국적선사의 경쟁력 강화를 위한 한중정기항로 활성화 방안에 대한 실증연구)

  • Lee, Yong-Ho
    • Journal of Navigation and Port Research
    • /
    • v.27 no.2
    • /
    • pp.163-170
    • /
    • 2003
  • This empirical study takes the activation approach for the competitive power of Korean shipping companies in the Korea-China liner routes. Data for this study were collected from Korea/ China/ 3rd flag shipping companies through the 500 questionnaires. The data of 250 respondents were analyzed statistically to verify the hypotheses and to induce Regression Equation which could predicts the influencing level of the determinants to competitive advantage for Korean shipping companies on Korea-China Liner Shipping Routes. Factor Analysis/ Cronbach's Alpha/ Principal Analysis/ Multiple Regression Analysis were used in order to test the hypotheses for the empirical study.

Analysis on Correlation between AE Parameters and Stress Intensity Factor using Principal Component Regression and Artificial Neural Network (주성분 회귀분석 및 인공신경망을 이용한 AE변수와 응력확대계수와의 상관관계 해석)

  • Kim, Ki-Bok;Yoon, Dong-Jin;Jeong, Jung-Chae;Park, Phi-Iip;Lee, Seung-Seok
    • Journal of the Korean Society for Nondestructive Testing
    • /
    • v.21 no.1
    • /
    • pp.80-90
    • /
    • 2001
  • The aim of this study is to develop the methodology which enables to identify the mechanical properties of element such as stress intensity factor by using the AE parameters. Considering the multivariate and nonlinear properties of AE parameters such as ringdown count, rise time, energy, event duration and peak amplitude from fatigue cracks of machine element the principal component regression(PCR) and artificial neural network(ANN) models for the estimation of stress intensity factor were developed and validated. The AE parameters were found to be very significant to estimate the stress intensity factor. Since the statistical values including correlation coefficients, standard mr of calibration, standard error of prediction and bias were stable, the PCR and ANN models for stress intensity factor were very robust. The performance of ANN model for unknown data of stress intensity factor was better than that of PCR model.

  • PDF

Establishment of Strategy for Management of Technology Using Data Mining Technique (데이터 마이닝을 통한 기술경영 전략 수립에 관한 연구)

  • Lee, Junseok;Lee, Joonhyuck;Kim, Gabjo;Park, Sangsung;Jang, Dongsik
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.25 no.2
    • /
    • pp.126-132
    • /
    • 2015
  • Technology forecasting is about understanding a status of a specific technology in the future, based on the current data of the technology. It is useful when planning technology management strategies. These days, it is common for countries, companies, and researchers to establish R&D directions and strategies by utilizing experts' opinions. However, this qualitative method of technology forecasting is costly and time consuming since it requires to collect a variety of opinions and analysis from many experts. In order to deal with these limitations, quantitative method of technology forecasting is being studied to secure objective forecast result and help R&D decision making process. This paper suggests a methodology of technology forecasting based on quantitative analysis. The methodology consists of data collection, principal component analysis, and technology forecasting by logistic regression, which is one of the data mining techniques. In this research, patent documents related to autonomous vehicle are collected. Then, the texts from patent documents are extracted by text mining technique to construct an appropriate form for analysis. After principal component analysis, logistic regression is performed by using principal component score. On the basis of this result, it is possible to analyze R&D development situation and technology forecasting.

Asymptotic Test for Dimensionality in Sliced Inverse Regression (분할 역회귀모형에서 차원결정을 위한 점근검정법)

  • Park, Chang-Sun;Kwak, Jae-Guen
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.2
    • /
    • pp.381-393
    • /
    • 2005
  • As a promising technique for dimension reduction in regression analysis, Sliced Inverse Regression (SIR) and an associated chi-square test for dimensionality were introduced by Li (1991). However, Li's test needs assumption of Normality for predictors and found to be heavily dependent on the number of slices. We will provide a unified asymptotic test for determining the dimensionality of the SIR model which is based on the probabilistic principal component analysis and free of normality assumption on predictors. Illustrative results with simulated and real examples will also be provided.

Detecting Influential Observations in Multivariate Statistical Analysis of Incomplete Data by PCA (주성분분석에 의한 결손 자료의 영향값 검출에 대한 연구)

  • 김현정;문승호;신재경
    • The Korean Journal of Applied Statistics
    • /
    • v.13 no.2
    • /
    • pp.383-392
    • /
    • 2000
  • Since late 1970, methods of influence or sensitivity analysis for detecting influential observations have been studied not only in regression and related methods but also in various multivariate methods. If results of multivariate analyses sometimes depend heavily on a small number of observations, we should be very careful to draw a conclusion. Similar phenomena may also occur in the case of incomplete data. In this research we try to study such influential observations in multivariate statistical analysis of incomplete data. Case of principal component analysis is studied with a numerical example.

  • PDF