• Title/Summary/Keyword: principal component regression

Search Result 251, Processing Time 0.026 seconds

Detecting Influential Observations in Multivariate Statistical Analysis of Incomplete Data by PCA (주성분분석에 의한 결손 자료의 영향값 검출에 대한 연구)

  • 김현정;문승호;신재경
    • The Korean Journal of Applied Statistics
    • /
    • v.13 no.2
    • /
    • pp.383-392
    • /
    • 2000
  • Since late 1970, methods of influence or sensitivity analysis for detecting influential observations have been studied not only in regression and related methods but also in various multivariate methods. If results of multivariate analyses sometimes depend heavily on a small number of observations, we should be very careful to draw a conclusion. Similar phenomena may also occur in the case of incomplete data. In this research we try to study such influential observations in multivariate statistical analysis of incomplete data. Case of principal component analysis is studied with a numerical example.

  • PDF

Effects of Environmental Factors on Aeromonas spp. Population in Naktong Estuary (낙동강 하구 생태계의 환경요인과 Aeromonas spp. 분포와의 관계)

  • 전도용;권오섭;하영칠
    • Korean Journal of Microbiology
    • /
    • v.27 no.4
    • /
    • pp.391-397
    • /
    • 1989
  • Population of Aeromonas and environmental parameters were investigated at three sites from August 1986, to December, 1986 in Naktong Estuary. The variation range of Aeromonas was $4.3\times10^{2}-4.6\times 10^{4}$ MPN/100ml. The result of ANOVA indicates significant differences among the populations of Aeromonas in each site. The highest population of Aeromonas occurred at site 2, and the lowest at site 3-B. To scrutinize the effects of environmental parameters on the distribution of Aeromonas spp, principal component analysis and multiple stepwise regression were used. The results showed that distribution of Aeromonas spp. was mainly influenced by outflow of freshwater and inflow of inorganic nutrients and correlated with heterotrophic bacteria, available nitrogen, fecal coliform bacteria, and temperature.

  • PDF

Synthetic data generation by probabilistic PCA (주성분 분석을 활용한 재현자료 생성)

  • Min-Jeong Park
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.4
    • /
    • pp.279-294
    • /
    • 2023
  • It is well known to generate synthetic data sets by the sequential regression multiple imputation (SRMI) method. The R-package synthpop are widely used for generating synthetic data by the SRMI approaches. In this paper, I suggest generating synthetic data based on the probabilistic principal component analysis (PPCA) method. Two simple data sets are used for a simulation study to compare the SRMI and PPCA approaches. Simulation results demonstrate that pairwise coefficients in synthetic data sets by PPCA can be closer to original ones than by SRMI. Furthermore, for the various data types that PPCA applications are well established, such as time series data, the PPCA approach can be extended to generate synthetic data sets.

Classical testing based on B-splines in functional linear models (함수형 선형모형에서의 B-스플라인에 기초한 검정)

  • Sohn, Jihoon;Lee, Eun Ryung
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.4
    • /
    • pp.607-618
    • /
    • 2019
  • A new and interesting task in statistics is to effectively analyze functional data that frequently comes from advances in modern science and technology in areas such as meteorology and biomedical sciences. Functional linear regression with scalar response is a popular functional data analysis technique and it is often a common problem to determine a functional association if a functional predictor variable affects the scalar response in the models. Recently, Kong et al. (Journal of Nonparametric Statistics, 28, 813-838, 2016) established classical testing methods for this based on functional principal component analysis (of the functional predictor), that is, the resulting eigenfunctions (as a basis). However, the eigenbasis functions are not generally suitable for regression purpose because they are only concerned with the variability of the functional predictor, not the functional association of interest in testing problems. Additionally, eigenfunctions are to be estimated from data so that estimation errors might be involved in the performance of testing procedures. To circumvent these issues, we propose a testing method based on fixed basis such as B-splines and show that it works well via simulations. It is also illustrated via simulated and real data examples that the proposed testing method provides more effective and intuitive results due to the localization properties of B-splines.

A Study on Patterning and Grading by the Impact of Traffic Culture Index (교통문화지수 영향요인에 의한 유형화와 영향정도에 관한 연구)

  • Jeong Cheal-Woo;Jung Hun-Young;Ko Sang-Sean
    • Journal of Navigation and Port Research
    • /
    • v.30 no.1 s.107
    • /
    • pp.35-43
    • /
    • 2006
  • This study suggests strategies to prevent traffic accidents by utilizing impact factors per each cluster and the typical patterns of 81 cities based on the statistical analysis of the data concerning the TCI which was developed from the partnership of the Traffic Safety Authority and the Green Traffic Movement Corporation in 2002 and 2003. The Principal Component Analysis and Cluster Analysis on impact factors and TCI result in 4 components and 4 clusters. Also as the results of Stepwise Multiple Regression Analysis examining the relationship between impact factors and TCI, R2 values of these models show high to all clusters. According to the results, we suggest strategies to prevent traffic accidents per cluster concretely and it is necessary to analyze how effective the invested facilities are in reducing traffic accidents in the future.

Simultaneous Determination of Tryptophan and Tyrosine by Spectrofluorimetry Using Multivariate Calibration Method (다변량 분석법을 이용한 Tryptophan과 Tyrosine의 형광분광법적 정량)

  • Lee, Sang-Hak;Park, Ju-Eun;Son, Beom-Mok
    • Journal of the Korean Chemical Society
    • /
    • v.46 no.4
    • /
    • pp.309-317
    • /
    • 2002
  • A spectrofluorimetric method for the simultaneous determination of amino acids (tryptophan and tyrosine) based on the application of multivariate calibration method such as principal component regression and partial least squares (PLS) to luminescence measurements has been studied. Emission spectra of synthetic mixtures of two amino acids were obtained at excitation wavelength of 257 ㎚. The calibration model in PCR and PLS was obtained from the spectral data in the range of 280-500 ㎚ for each standard of a calibration set of 32 standards, each containing different amounts of two amino acids. The relative standard error of prediction ($RSEP_a$) was obtained to assess the model goodness in quantifying each analyte in a validation set. The overall relative standard error of prediction ($RSEP_m$) for the mixture obtained from the results of a validation set, formed by 6 independent mixtures was also used to validate the present method.

Chemometric Analysis of 2D Fluorescence Spectra for Monitoring and Modeling of Fermentation Processes (생물공정 모니터링 및 모델링을 위한 2차원 형광스펙트럼의 다변량 분석)

  • Kang Tae-Hyoung;Sohn Ok-Jae;Kim Chun-Kwang;Chung Sang-Wook;Rhee Jong-Il
    • KSBB Journal
    • /
    • v.21 no.1 s.96
    • /
    • pp.59-67
    • /
    • 2006
  • 2D spectrofluorometer produces many spectral data during fermentation processes. The fluorescence spectra are analyzed using chemometric methods such as principal component analysis (PCA), principal component regression (PCR) and partial least square regression (PLS). Analysis of the spectral data by PCA results in scores and loadings that are visualized in score-loading plots and used to monitor a few fermentation processes by S. cerevisae and recombinant E. coli. Two chemometric models were established to analyze the correlation between fluorescence spectra and process variables using PCR and PLS, and PLS was found to show slightly better calibration and prediction performance than PCR.

Combining Radar and Rain Gauge Observations Utilizing Gaussian-Process-Based Regression and Support Vector Learning (가우시안 프로세스 기반 함수근사와 서포트 벡터 학습을 이용한 레이더 및 강우계 관측 데이터의 융합)

  • Yoo, Chul-Sang;Park, Joo-Young
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.3
    • /
    • pp.297-305
    • /
    • 2008
  • Recently, kernel methods have attracted great interests in the areas of pattern classification, function approximation, and anomaly detection. The role of the kernel is particularly important in the methods such as SVM(support vector machine) and KPCA(kernel principal component analysis), for it can generalize the conventional linear machines to be capable of efficiently handling nonlinearities. This paper considers the problem of combining radar and rain gauge observations utilizing the regression approach based on the kernel-based gaussian process and support vector learning. The data-assimilation results of the considered methods are reported for the radar and rain gauge observations collected over the region covering parts of Gangwon, Kyungbuk, and Chungbuk provinces of Korea, along with performance comparison.

Analysis of Varietal Variation in Alkali Digestion of Milled Rice at Several Levels of Alkali Concentration (쌀의 KOH 농도별 붕괴양상에 따른 품종변이 해석)

  • 최해춘;손영희
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.38 no.1
    • /
    • pp.31-37
    • /
    • 1993
  • To analyze and classify the varietal variation of alkali digestibility in detail, which is closely connected with the gelatinization temperature and physical characteristics of cooked rice, the patterns of alkali decomposition changed along the alkali concentration were investigated for thirty three Korean leading rice cultivars and new breeding lines(japonica : 25, Tongil-type:8) including five glutinous rice. Principal component analysis was used to condense the information and to classify rice materials according to decomposed reaction pattern at several levels of potassium hydroxide(KOH) concentration. Thirty three rice varieties were classified largely into four groups by the distribution on the plane of upper two principal component scores which contained above 92% of total informations. Group I was consisted of one variety, Dobongbyeo, which owned almost same strong resistance to alkali digestion at the range of 0.8% to 1.6% KOH solutions. Group II included three japonica and Tongil-type glutinous rice varieties, which revealed medium alkali digestion value(ADV) at 1.4% KOH solution and intermediate change in ADV from 0.8% to 1.6% KOH solutions. Most of Tongil-type and early-maturity japonica rice, which exhibited medium-high ADV at 1.4% of KOH concentration and large ADV difference between low and high alkali solutions, were contained in Group III. Group N included most of medium or medium-late-maturity japonica, which showed high ADV at 1.4% KOH and medium or intermediate-high ADV change between low and high alkali solutions. The 1st principal component indicated the average index of ADV through 0.8-1.6% KOH solutions and the 2nd principal component pointed out the factor related with ADV difference between low and high alkali solutions or regression coefficients of ADV change along with the KOH concentrations.

  • PDF

Water Quality Characteristics of the Major Tributaries in Yeongsan and Sumjin River Basin using Statistical Analysis (통계분석을 이용한 영산강·섬진강수계 주요 유입지천의 수질 특성)

  • Park, Jinhwan;Jung, Jaewoon;Kim, Daeyoung;Kim, Kapsoon;Han, Sungwook;Kim, Hyunook;Lim, Byungjin
    • Journal of Environmental Impact Assessment
    • /
    • v.22 no.2
    • /
    • pp.171-181
    • /
    • 2013
  • In this study, we report the water quality characteristics of pollutants for 4 major tributaries in the Yeongsan and Sumjin river basins using statistical analysis, such as regression equation and factor analysis. The flow rate and water qualtiy data collected from 4 sampling sites(Hwangryoung A, Jiseok A, Chooryeong A, Osu A) in the Yeonsan and Sumjin river basin during the past 3 years were analyzed for 11 parameters(flow rate, dissolved oxgen, pH, water temperature, electric conductivity, biochemical oxygen demand, chemical oxygen deman, total organic carbon, total nitorgen, total phosphorus, suspended solid). The results showed that the concentrations of BOD, COD, TOC, T-N, T-P in Hwangryoung A(HW) and Jiseok A(JS) of the Yeongsan river basin were decreased as the flow rate was increased. This means that rather than nonpoint soources, point sources affect water quality. In the cases of Chooryeong A(CR) and Osu A(OS) in the Sumjin river basin, howerever, nonpoint sources than point sources are an important factor that affects the water quality. Also, the factor analysis technique was employed to analyze principal component influencing on water quality. The results revealed that the first principal component in HW was correlated with EC, DO, T-N, water temperature. This "nitrogen influx according to seasonal pattern" factor may be interpreted. In JS, the first principal component was correlated with BOD, COD, TOC and is likely to represent "organic matter" processes. In CR and OS, BOD, COD, TOC, SS and T-P were significantly correlated and is considered as representing "Organic matter and adsorption of phosphorus on sediments influx". This study is expected to contribute to the effective pollution control/management of the surfac waters in the study sites.