• Title/Summary/Keyword: 주성분회귀분석

Search Result 152, Processing Time 0.033 seconds

Damage Prediction Using Heavy Rain Risk Assessment (호우 위험도 평가를 이용한 피해예측)

  • Kim, Jong Sung;Choi, Chang Hyun;Lee, Jong So;Kim, Hung Soo
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2017.05a
    • /
    • pp.154-154
    • /
    • 2017
  • 전 세계적인 기후변동과 기후변화의 영향으로 대규모 인명 및 재산피해를 유발하는 자연재난의 빈도와 강도가 증가하고 있다. 이렇게 변화하는 상황에서 효율적인 대책을 수립하기 위해서는 재해에 노출된 특성을 지역적 특성과 함께 고려하여 지역별로 재해에 위험한 정도를 평가하는 것이 선행되어지고, 재난 피해 발생전에 피해 지역 및 범위를 예측하는 것이 필요하다고 판단된다. 따라서 본 연구에서는 국내 자연재난 피해의 65% 이상을 차지하는 호우피해를 대상으로 PSR(Pressure-State-Response) 구조를 이용하여 호우피해위험지수(Heavy rain Damage Risk Index, HDRI)를 제안하여 호우 위험도를 평가하고자하였다. 또한 도출된 지역별 위험등급에 따른 호우피해 예측함수를 개발하여 재해발생 전에 개략적인 피해의 범위를 예측하고자 하였다. 먼저 지역별 호우 위험도 평가를 위해 압력지표, 현상지표, 대책지표를 구축하고, 주성분분석을 이용하여 평가지표를 결정하였다. 결정된 평가지표를 동일한 가중치를 부여하여 호우피해위험지수를 도출하였다. 분석결과, 경기도 31개 지자체 중에서 가장 안전한 1등급인 지자체는 15개의 지자체로 나타났으며, 2등급인 지자체는 7개, 3등급인 지자체는 9개로 분류되었다. 지자체별 호우 위험도 등급에 따라서 재해기간별 총강우량, 재해일수, 선행강우량(1~5일), 지속시간별 최대강우량(1~24시간) 등의 자료를 설명변수로 구축하였고, 다중회귀모형과 주성분분석을 활용하여 예측함수를 개발하였다. 등급별 호우피해 예측함수는 N-RMSE가 12~18%로 호우피해를 적절하게 예측하는 것으로 평가되었다. 본 연구를 통해 지자체별 호우피해위험도 등급을 파악 할 수 있으며, 평가된 호우피해위험도 등급별로 호우피해 예측함수 개발을 통해 사전에 호우피해 발생 및 규모를 파악할 수 있게 되었다. 따라서 본 연구의 결과는 각 지자체 및 관련 부처에서 효과적인 방재체계를 수립하는데 있어 기초자료로 활용될 수 있을 것으로 판단된다.

  • PDF

Detecting Influential Observations in Multivariate Statistical Analysis of Incomplete Data by PCA (주성분분석에 의한 결손 자료의 영향값 검출에 대한 연구)

  • 김현정;문승호;신재경
    • The Korean Journal of Applied Statistics
    • /
    • v.13 no.2
    • /
    • pp.383-392
    • /
    • 2000
  • Since late 1970, methods of influence or sensitivity analysis for detecting influential observations have been studied not only in regression and related methods but also in various multivariate methods. If results of multivariate analyses sometimes depend heavily on a small number of observations, we should be very careful to draw a conclusion. Similar phenomena may also occur in the case of incomplete data. In this research we try to study such influential observations in multivariate statistical analysis of incomplete data. Case of principal component analysis is studied with a numerical example.

  • PDF

A Study of Prediction on Company's Growth with R and Analysis Algoritnm (R과 분석 알고리즘을 활용한 기업의 성장성 예측에 관한 연구)

  • Kang, Hui-Seok;Kim, Kyung-Su;Ryu, Ji-Seung;Lee, Ga-Yeon;Lee, Min-Jung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.11a
    • /
    • pp.428-431
    • /
    • 2017
  • 기업의 성장성과 기업 주식가치를 매출, 매출원가, 영업이익율 등의 정형데이터와 경제, 경영관련 뉴스 등 비정형 데이터를 토대로 다양한 알고리즘을 활용해 분석하고, 그 결과의 유의성을 검증한다. 주성분회귀분석, 인공신경망, 나이브 베이지안 분류자, 긍/부정 사전분석 모델을 통해 분석된 결과를 검토하여 각 분석모델 별 성능을 확인하고, 기업 성장성 예측을 위해 활용 가능한 모델과 필요한 데이터를 제시한다.

Simultaneous Determination of Anionic and Nonionic Surfactants Using Multivariate Calibration Method (다변량 분석법에 의한 Anionic Surfactant와 Nonionic Surfactant의 동시정량)

  • Sang Hak Lee;Soon Nam Kwon;Bum Mok Son
    • Journal of the Korean Chemical Society
    • /
    • v.47 no.1
    • /
    • pp.19-25
    • /
    • 2003
  • A spectrophotometric method for the simultaneous determination of anionic and nonionic surfactant based on the application of multivariate calibration method such as principal component regression(PCR) and partial least squares(PLS) has been studied. The calibration models in PCR and PLS were obtained from the spectral data in the range of 400~700 nm for each standard of a calibration set of 26 standards, each containing different amounts of two surfactants. The relative standard error of prediction(RSEP$_{\alpha}$) was obtained to assess the model goodness in quantifying each analyte in a 5 validation samples which containing different amounts of two surfactants.

Classical testing based on B-splines in functional linear models (함수형 선형모형에서의 B-스플라인에 기초한 검정)

  • Sohn, Jihoon;Lee, Eun Ryung
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.4
    • /
    • pp.607-618
    • /
    • 2019
  • A new and interesting task in statistics is to effectively analyze functional data that frequently comes from advances in modern science and technology in areas such as meteorology and biomedical sciences. Functional linear regression with scalar response is a popular functional data analysis technique and it is often a common problem to determine a functional association if a functional predictor variable affects the scalar response in the models. Recently, Kong et al. (Journal of Nonparametric Statistics, 28, 813-838, 2016) established classical testing methods for this based on functional principal component analysis (of the functional predictor), that is, the resulting eigenfunctions (as a basis). However, the eigenbasis functions are not generally suitable for regression purpose because they are only concerned with the variability of the functional predictor, not the functional association of interest in testing problems. Additionally, eigenfunctions are to be estimated from data so that estimation errors might be involved in the performance of testing procedures. To circumvent these issues, we propose a testing method based on fixed basis such as B-splines and show that it works well via simulations. It is also illustrated via simulated and real data examples that the proposed testing method provides more effective and intuitive results due to the localization properties of B-splines.

A Study on the Prediction of Fuel Consumption of a Ship Using the Principal Component Analysis (주성분 분석기법을 이용한 선박의 연료소비 예측에 관한 연구)

  • Kim, Young-Rong;Kim, Gujong;Park, Jun-Bum
    • Journal of Navigation and Port Research
    • /
    • v.43 no.6
    • /
    • pp.335-343
    • /
    • 2019
  • As the regulations of ship exhaust gas have been strengthened recently, many measures are under consideration to reduce fuel consumption. Among them, research has been performed actively to develop a machine-learning model that predicts fuel consumption by using data collected from ships. However, many studies have not considered the methodology of the main parameter selection for the model or the processing of the collected data sufficiently, and the reckless use of data may cause problems such as multicollinearity between variables. In this study, we propose a method to predict the fuel consumption of the ship by using the principal component analysis to solve these problems. The principal component analysis was performed on the operational data of the 13K TEU container ship and the fuel consumption prediction model was implemented by regression analysis with extracted components. As the R-squared value of the model for the test data was 82.99%, this model would be expected to support the decision-making of operators in the voyage planning and contribute to the monitoring of energy-efficient operation of ships during voyages.

Reliability Analysis of VOC Data for Opinion Mining (오피니언 마이닝을 위한 VOC 데이타의 신뢰성 분석)

  • Kim, Dongwon;Yu, Song Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.217-245
    • /
    • 2016
  • The purpose of this study is to verify how 7 sentiment domains extracted through sentiment analysis from social media have an influence on business performance. It consists of three phases. In phase I, we constructed the sentiment lexicon after crawling 45,447 pieces of VOC (Voice of the Customer) on 26 auto companies from the car community and extracting the POS information and built a seven-sensitive domains. In phase II, in order to retain the reliability of experimental data, we examined auto-correlation analysis and PCA. In phase III, we investigated how 7 domains impact on the market share of three major (GM, FCA, and VOLKSWAGEN) auto companies by using linear regression analysis. The findings from the auto-correlation analysis proved auto-correlation and the sequence of the sentiments, and the results from PCA reported the 7 sentiments connected with positivity, negativity and neutrality. As a result of linear regression analysis on model 1, we indentified that the sentimental factors have a significant influence on the actual market share. In particular, not only posotive and negative sentiment domains, but neutral sentiment had significantly impacted on auto market share. As we apply the availability of data to the market, and take advantage of auto-correlation of the market-related information and the sentiment, the findings will be a huge contribution to other researches on sentiment analysis as well as actual business performances in various ways.

Principal Component Analysis of GPS Height Time Series from 14 Permanent GPS Stations Operated by National Geographic Information Institute (주성분분석을 통한 국토지리정보원 14개 GPS 상시관측소 수직좌표 시계열 분석)

  • Kim, Kyeong-Hui;Park, Kwan-Dong
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.28 no.3
    • /
    • pp.361-367
    • /
    • 2010
  • We produced continuous vertical time series of 14 permanent GPS stations operated by National Geographic Information Institute by processing about five years of data. Then we computed the height velocities by using a linear regression fitting of those time series, and did principal component analysis to understand the overall characteristics of the series. The prominent signal obtained as the first mode of PCA results showed an average of 4.2 mm/yr vertical velocity. The values of the first mode eigenvectors were consistent at all sites. Thus, we concluded that all the 14 stations are uplifting nearly at the same velocity for the test period. Then changes of precision before and after removing the first mode signal from the 14 height time series were analyzed. As a result, the precision improved 34.8% on average.

Prediction of golf scores on the PGA tour using statistical models (PGA 투어의 골프 스코어 예측 및 분석)

  • Lim, Jungeun;Lim, Youngin;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.1
    • /
    • pp.41-55
    • /
    • 2017
  • This study predicts the average scores of top 150 PGA golf players on 132 PGA Tour tournaments (2013-2015) using data mining techniques and statistical analysis. This study also aims to predict the Top 10 and Top 25 best players in 4 different playoffs. Linear and nonlinear regression methods were used to predict average scores. Stepwise regression, all best subset, LASSO, ridge regression and principal component regression were used for the linear regression method. Tree, bagging, gradient boosting, neural network, random forests and KNN were used for nonlinear regression method. We found that the average score increases as fairway firmness or green height or average maximum wind speed increases. We also found that the average score decreases as the number of one-putts or scrambling variable or longest driving distance increases. All 11 different models have low prediction error when predicting the average scores of PGA Tournaments in 2015 which is not included in the training set. However, the performances of Bagging and Random Forest models are the best among all models and these two models have the highest prediction accuracy when predicting the Top 10 and Top 25 best players in 4 different playoffs.

Synthetic data generation by probabilistic PCA (주성분 분석을 활용한 재현자료 생성)

  • Min-Jeong Park
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.4
    • /
    • pp.279-294
    • /
    • 2023
  • It is well known to generate synthetic data sets by the sequential regression multiple imputation (SRMI) method. The R-package synthpop are widely used for generating synthetic data by the SRMI approaches. In this paper, I suggest generating synthetic data based on the probabilistic principal component analysis (PPCA) method. Two simple data sets are used for a simulation study to compare the SRMI and PPCA approaches. Simulation results demonstrate that pairwise coefficients in synthetic data sets by PPCA can be closer to original ones than by SRMI. Furthermore, for the various data types that PPCA applications are well established, such as time series data, the PPCA approach can be extended to generate synthetic data sets.