• 제목/요약/키워드: principal component regression

검색결과 253건 처리시간 0.028초

Water Quality Assessment and Turbidity Prediction Using Multivariate Statistical Techniques: A Case Study of the Cheurfa Dam in Northwestern Algeria

  • ADDOUCHE, Amina;RIGHI, Ali;HAMRI, Mehdi Mohamed;BENGHAREZ, Zohra;ZIZI, Zahia
    • 공업화학
    • /
    • 제33권6호
    • /
    • pp.563-573
    • /
    • 2022
  • This work aimed to develop a new equation for turbidity (Turb) simulation and prediction using statistical methods based on principal component analysis (PCA) and multiple linear regression (MLR). For this purpose, water samples were collected monthly over a five year period from Cheurfa dam, an important reservoir in Northwestern Algeria, and analyzed for 12 parameters, including temperature (T°), pH, electrical conductivity (EC), turbidity (Turb), dissolved oxygen (DO), ammonium (NH4+), nitrate (NO3-), nitrite (NO2-), phosphate (PO43-), total suspended solids (TSS), biochemical oxygen demand (BOD5) and chemical oxygen demand (COD). The results revealed a strong mineralization of the water and low dissolved oxygen (DO) content during the summer period. High levels of TSS and Turb were recorded during rainy periods. In addition, water was charged with phosphate (PO43-) in the whole period of study. The PCA results revealed ten factors, three of which were significant (eigenvalues >1) and explained 75.5% of the total variance. The F1 and F2 factors explained 36.5% and 26.7% of the total variance, respectively and indicated anthropogenic pollution of domestic agricultural and industrial origin. The MLR turbidity simulation model exhibited a high coefficient of determination (R2 = 92.20%), indicating that 92.20% of the data variability can be explained by the model. TSS, DO, EC, NO3-, NO2-, and COD were the most significant contributing parameters (p values << 0.05) in turbidity prediction. The present study can help with decision-making on the management and monitoring of the water quality of the dam, which is the primary source of drinking water in this region.

Use of the Quantitatively Transformed Field Soil Structure Description of the US National Pedon Characterization Database to Improve Soil Pedotransfer Function

  • Yoon, Sung-Won;Gimenez, Daniel;Nemes, Attila;Chun, Hyen-Chung;Zhang, Yong-Seon;Sonn, Yeon-Kyu;Kang, Seong-Soo;Kim, Myung-Sook;Kim, Yoo-Hak;Ha, Sang-Keun
    • 한국토양비료학회지
    • /
    • 제44권5호
    • /
    • pp.944-958
    • /
    • 2011
  • Soil hydraulic properties such as hydraulic conductivity or water retention which are costly to measure can be indirectly generated by soil pedotransfer function (PTF) using easily obtainable soil data. The field soil structure description which is routinely recorded could also be used in PTF as an input to reduce the uncertainty. The purposes of this study were to use qualitative morphological soil structure descriptions and soil structural index into PTF and to evaluate their contribution in the prediction of soil hydraulic properties. We transformed categorical morphological descriptions of soil structure into quantitative values using categorical principal component analysis (CATPCA). This approach was tested with a large data set from the US National Pedon Characterization database with the aid of a categorical regression tree analysis. Six different PTFs were used to predict the saturated hydraulic conductivity and those results were averaged to quantify the uncertainty. Quantified morphological description was successively used in multiple linear regression approach to predict the averaged ensemble saturated conductivity. The selected stepwise regression model with only the transformed morphological variables and structural index as predictors predicted the $K_{sat}$ with $r^2$ = 0.48 (p = 0.018), indicating the feasibility of CATPCA approach. In a regression tree analysis, soil structure index and soil texture turned out to be important factors in the prediction of the hydraulic properties. Among structural descriptions size class turned out to be an important grouping parameter in the regression tree. Bulk density, clay content, W33 and structural index explained clusters selected by a two step clustering technique, implying the morphologically described soil structural features are closely related to soil physical as well as hydraulic properties. Although this study provided relatively new method which related soil structure description to soil structure index, the same approach should be tested using a datasets containing the actual measurement of hydraulic properties. More insight on the predictive power of soil structure index to estimate hydraulic properties would be achieved by considering measured the saturated hydraulic conductivity and the soil water retention.

한강수질 평가를 위한 COD (화학적 산소 요구량) 모델 평가 (Chemical Oxygen Demand (COD) Model for the Assessment of Water Quality in the Han River, Korea)

  • Kim, Jae Hyoun;Jo, Jinnam
    • 한국환경보건학회지
    • /
    • 제42권4호
    • /
    • pp.280-292
    • /
    • 2016
  • Objectives: The objective of this study was to build COD regression models for the Han River and evaluate water quality. Methods: Water quality data sets for the dry season (as of January) during a four-year period (2012-2015) were collected from the database of the Han River automatic water quality monitoring stations. Statistical techniques, including combined genetic algorithm-multiple linear regression (GA-MLR) were used to build five-descriptor COD models. Multivariate statistical techniques such as principal component analysis (PCA) and cluster analysis (CA) are useful tools for extracting meaningful information. Results: The $r^2$ of the best COD models provided significant high values (> 0.8) between 2012 and 2015. Total organic carbon (TOC) was a surrogate indicator for COD (as COD/TOC) with high reliability ($r^2=0.63$ in 2012, $r^2=0.75$ for 2013, $r^2=0.79$ for 2014 and $r^2=0.85$ for 2015). The ratios of COD/TOC were calculated as 2.08 in 2012, 1.79 in 2013, 1.52 and 1.45 in 2015, indicating that biodegradability in the water body of the Han River was being sustained, thereby further improving water quality. The BOD/COD ratio supported these findings. The cluster analysis revealed higher annual levels of microorganisms and phosphorous at stations along the Hangang-Seoul and Hantangang areas. Nevertheless, the overall water quality over the last four years showed an observable trend toward continuous improvement. These findings also suggest that non-point pollution control strategies should consider the influence of upstreams and downstreams to protect water quality in the Han River. Conclusion: This data analysis procedure provided an efficient and comprehensive tool to interpret complex water quality data matrices. Results from a trend analysis provided much important information about sources and parameters for Han River water quality management.

드라마 "대장금"의 한의학 콘텐츠 요소 및 만족도 평가 (Evaluation of a Traditional Korean Medicine Content Factor and Satisfaction with the Drama "Daejanggeum")

  • 김송이;김호선;남민호;리위에쮜엔;쩡홍치앙;박히준;이혜정;채윤병
    • Journal of Acupuncture Research
    • /
    • 제27권1호
    • /
    • pp.11-20
    • /
    • 2010
  • Objectives : The study was performed to evaluate a traditional Korean Medicine content in drama "Daejanggeum". Methods : One hundred sixty-nine participants in Taiwan responded to the survey with 10 items, regarding components of success of drama "Daejanggeum". Principal component factor analysis and multiple regression analysis were performed to identify the possible factors to satisfaction with watching drama "Daejanggeum". Results : Factor analysis revealed that dramatic factor(44.8%), content factor(12.3%), and cultural factor(11.3%) were the most important factors to success of drama "Daejanggeum". Multiple regression analysis showed that dramatic factor(beta = .342), content factor(beta = .278), and cultural factor(beta = .131) were associated with the satisfaction with watching drama "Daejanggeum"($R^2$ = .394, with F = 32.280, p<.001). Conclusions : This study demonstrated that dramatic factor, content factor, and cultural factor are the most important factors associated with satisfaction with drama "Daejanggeum" in Taiwan. These findings suggest that a traditional Korean Medicine as a content factor would be very influential in enhancing the possibility of success of drama.

IT산업 연구개발 투자의 경제적 효과 분석 (An Analysis of the Economic Effects of R&D Investment in the IT Industry)

  • 홍재표;최나린;김방룡
    • 한국통신학회논문지
    • /
    • 제37B권9호
    • /
    • pp.837-848
    • /
    • 2012
  • 본 연구에서는 IT산업의 연구개발 투자가 부가가치에 미치는 영향을 분석하기 위하여 IT산업을 방송통신기기, 정보기기, 전자부품으로 세분하고, 각 세부 산업별로 자본스톡, 노동투입, 연구개발스톡을 독립변수로 설정하여 다중회귀분석을 실시하였다. 분석 결과, 모든 산업부문에서 t-value와 R-square 값들은 유의한 것으로 나타났지만, 자기상관은 매우 높게 나타났다. 한편 정보기기산업에서는 연구개발스톡, 전자부품산업에서는 노통투입의 계수 값이 마이너스로 나타나서 다중공선성의 징후가 의심되었다. 본 연구에서는 Cochrane-Orcutt 절차와 주성분회귀분석을 통하여 자기상관 및 다중공선성의 문제를 해결하였다. 연구개발스톡이 부가가치에 미치는 영향을 분석한 결과, 방송통신기기산업의 연구개발투자가 정보기기산업이나 전자부품산업에 비해 훨씬 큰 영향을 미치는 것으로 추정되었다.

다목적 다변량 자료분석을 위한 변수선택 (Variable Selection for Multi-Purpose Multivariate Data Analysis)

  • 허명회;임용빈;이용구
    • 응용통계연구
    • /
    • 제21권1호
    • /
    • pp.141-149
    • /
    • 2008
  • 다변량 자료분석에서 최근의 추세는 관측개체의 수 n이 커지는 외에 변수의 수 p가 큰사례들이 많아지고 있다는 것이다. n개 개체 각각에서 획득된 p개 변수들 $X_1$, $X_2$, $\ldots$, $X_p$ 가운데는 이름이나 개념적으로는 구분이 가능하지 만 실제로 거의 중복이 되는 변수들이 있을 수 있는데, 이들 변수들이 모두 분석에 포함되면 여러 문제가 유발될 수 있다. 예컨대 주성분 분석이나 인자분석에서는 중복 변수들이 주축(主軸, principal axis) 결정에, 관측개체 군집 화에서는 개체간 거리 산출에 왜곡된 영향을 줄 수 있다. 또한 목적변수가 지정된 지도학습(supervised learning)에서 설명변수들의 중복성은 추정모형의 안정성을 해치는 결과를 초래한다. 실제 자료 분석에서는 한 자료 세트가 여러 기법으로 탐색되고 다수의 모형이 추출되므로 변수세트를 최대한 절약적(parsimonious)으로 구성할 필요가 있다. 본 연구의 목적은 $X_1$, $X_2$, $\ldots$, $X_p$ 중에서 필요한 변수들은 선적하고 불필요한 변수들은 제거함으로써 주어진 변수세트를 보다 적은 크기의 변수세트로 대치하는 방법을 제시하는 데 있다. 제안 방법을 몇 개의 수치적 사례에 적용해 봄으로써 선적 변수와 제거변수간 관계의 시각화, 회귀모형에서의 유용성, 범주형 자료분석에서의 활용 등에 대해 논의 하고자 한다.

FT-IR 스펙트럼 데이터의 다변량 통계분석을 이용한 고기능성 아프리칸 얌 식별 및 기능성 성분 함량 예측 모델링 (Discrimination of African Yams Containing High Functional Compounds Using FT-IR Fingerprinting Combined by Multivariate Analysis and Quantitative Prediction of Functional Compounds by PLS Regression Modeling)

  • 송승엽;지은이;안명숙;김동진;김인중;김석원
    • 원예과학기술지
    • /
    • 제32권1호
    • /
    • pp.105-114
    • /
    • 2014
  • 본 연구에서는 UV-VIS spectrophotometer를 이용한 total carotenoids, flavonoids, phenolics 함량 데이터와 FT-IR 스펙트럼 데이터를 다변량통계분석법을 통하여 기능성 성분 함량이 높은 아프리칸 얌 고속 선발 시스템을 구축하였다. 62개 아프리칸 얌의 total carotenoids 함량은 $0.01-0.91{\mu}g{\cdot}g^{-1}$ dry wt 나타냈다. Total flavonoids와 phenolics 함량은 $12.9-229.0{\mu}g{\cdot}g^{-1}$ dry wt와 $0.29-5.2mg{\cdot}g^{-1}$ dry wt로 각각 나타났다. 아프리칸 얌은 FT-IR 스펙트럼상의 1700-1500, 1500-1300, $1,100-950cm^{-1}$, 부위에서 중요한 스펙트럼 변화가 나타났다. 이 부위는 각각 amide I과 II을 포함하는 아미노산 및 단백질계열의 화합물, phosphodiester group을 포함한 핵산 및 인지질 그리고 단당류나 복합 다당류를 포함하는 carbohydrates 계열의 화합물들의 질적, 양적 정보를 반영하는 부위이다. PCA 분석과 PLS-DA 분석에서 62개 아프리칸 얌은 유연성이 높은 종으로 3개의 그룹을 형성하였다. 아프리칸 얌의 FT-IR 스펙트럼 데이터와 UV-VIS spectrophotometer을 이용한 total carotenoids, flavonoids, phenolics 함량 데이터 간에 PLS regression 분석하였다. Total carotenoids, flavonoids, phenolics 함량 성분의 실측 값과 예측 값간에 상관계수($R^2$)가 각각 0.83, 0.86, 0.72로 나타났다. 이 결과, 아프리칸 얌으로부터 FT-IR 스펙트럼을 이용한 total carotenoids, flavonoids, phenolics 함량 예측이 가능하였다. 본 연구에서 확립된 대사체 수준에서 아프리칸 얌의 유용 기능성 성분 함량 예측 모델링을 통해 품종, 계통의 신속한 선발 수단으로 활용이 가능할 것으로 예상된다.

PGA 투어의 골프 스코어 예측 및 분석 (Prediction of golf scores on the PGA tour using statistical models)

  • 임정은;임영인;송종우
    • 응용통계연구
    • /
    • 제30권1호
    • /
    • pp.41-55
    • /
    • 2017
  • 최근 골프는 많은 사람들의 취미 생활로서 자리를 잡아가고 있으며 골프와 관련된 연구도 다양하게 이루어지고 있다. 본 연구에서는 데이터 마이닝 기법을 사용하여 PGA 투어에 참여하는 선수들의 평균스코어를 예측하고 스코어에 유의한 영향을 미치는 변수들을 제시하고자 한다. 그리고 추가적으로 4개의 PGA 투어 플레이오프에 대해 상위 10명, 상위 25명의 선수들을 예측하는 것을 목표로 한다. 우리는 다양한 선형/비선형 회귀분석 방법을 이용하여 평균스코어를 예측하는데, 선형회귀분석 방법으로는 단계적 선택법, 모든 가능한 회귀모형, 라소(LASSO), 능형회귀, 주성분회귀분석을 사용하였으며 비선형회귀분석 방법으로는 트리(CART), 배깅, 그래디언트 부스팅, 신경망 모형, 랜덤 포레스트, 최근접이웃방법(KNN)을 사용하였다. 대부분의 모형에서 공통적으로 선택된 변수들을 살펴보면 페어웨이의 단단함와 그린의 풀의 높이, 평균최대풍속이 높을수록 선수들의 평균스코어는 높아지며 반대로 한 번에 퍼팅을 성공시키는 횟수와 그린적중률 실패 후 버디나 이글로 점수를 만드는 scrambling 변수들, 그리고 공을 멀리 보낼 수 있는 능력을 나타내는 longest drive는 그 값이 높아짐에 따라 선수들의 평균스코어가 낮아지는 경향이 있음을 알 수 있었다. 11가지 모형 모두 테스트 데이터인 2015년 경기 결과를 예측하는데 낮은 오류율을 보였으나 배깅과 랜덤 포레스트의 예측률이 가장 좋았으며 두 모형 모두 상위 10명과 상위 25명의 랭킹을 예측할 때 상당히 높은 적중률을 보였다.

An intelligent sensor system with reconstruction mechanism of faulty signal

  • Jung, Young-Su;Hyun, Woong-Keun;Yoon, In-Mo;Jung, Young-Kee;Kim, C.S.;Kim, Nam-Ho
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 2003년도 ICCAS
    • /
    • pp.1231-1234
    • /
    • 2003
  • A sensor working in outdoor may generate some faulty signal owing to dust and high temperature. This paper describes an intelligent sensor system and controller which has a reconstruction mechanism for faulty signal. The faulty signals are dievided into two types as linear distortion and non linear distortion, respectively. The linear distorted signal is due to dust, and non linear distorted signal is due to physical breakdown of sensor or high temperature. These distorted signal have been reconstructed by the proposed method based on polynomial regression method and principal component analysis approach.. The proposed method has been applied to sun tracking system working in outdoor. For a robust and precision control of sun tracker, a fuzzy controller was also proposed. The fuzzy controller controls the tracker by using the collected sensor signal. The tolerance of the position control is within 1.5 degree. To show the validity of the developed system, some experiments in the field were illustrated.

  • PDF

Enhancement of the Virtual Metrology Performance for Plasma-assisted Processes by Using Plasma Information (PI) Parameters

  • Park, Seolhye;Lee, Juyoung;Jeong, Sangmin;Jang, Yunchang;Ryu, Sangwon;Roh, Hyun-Joon;Kim, Gon-Ho
    • 한국진공학회:학술대회논문집
    • /
    • 한국진공학회 2015년도 제49회 하계 정기학술대회 초록집
    • /
    • pp.132-132
    • /
    • 2015
  • Virtual metrology (VM) model based on plasma information (PI) parameter for C4F8 plasma-assisted oxide etching processes is developed to predict and monitor the process results such as an etching rate with improved performance. To apply fault detection and classification (FDC) or advanced process control (APC) models on to the real mass production lines efficiently, high performance VM model is certainly required and principal component regression (PCR) is preferred technique for VM modeling despite this method requires many number of data set to obtain statistically guaranteed accuracy. In this study, as an effective method to include the 'good information' representing parameter into the VM model, PI parameters are introduced and applied for the etch rate prediction. By the adoption of PI parameters of b-, q-factors and surface passivation parameters as PCs into the PCR based VM model, information about the reactions in the plasma volume, surface, and sheath regions can be efficiently included into the VM model; thus, the performance of VM is secured even for insufficient data set provided cases. For mass production data of 350 wafers, developed PI based VM (PI-VM) model was satisfied required prediction accuracy of industry in C4F8 plasma-assisted oxide etching process.

  • PDF