• Title/Summary/Keyword: Multivariate Correlation Analysis

Search Result 374, Processing Time 0.023 seconds

Missing Value Estimation and Sensor Fault Identification using Multivariate Statistical Analysis (다변량 통계 분석을 이용한 결측 데이터의 예측과 센서이상 확인)

  • Lee, Changkyu;Lee, In-Beum
    • Korean Chemical Engineering Research
    • /
    • v.45 no.1
    • /
    • pp.87-92
    • /
    • 2007
  • Recently, developments of process monitoring system in order to detect and diagnose process abnormalities has got the spotlight in process systems engineering. Normal data obtained from processes provide available information of process characteristics to be used for modeling, monitoring, and control. Since modern chemical and environmental processes have high dimensionality, strong correlation, severe dynamics and nonlinearity, it is not easy to analyze a process through model-based approach. To overcome limitations of model-based approach, lots of system engineers and academic researchers have focused on statistical approach combined with multivariable analysis such as principal component analysis (PCA), partial least squares (PLS), and so on. Several multivariate analysis methods have been modified to apply it to a chemical process with specific characteristics such as dynamics, nonlinearity, and so on.This paper discusses about missing value estimation and sensor fault identification based on process variable reconstruction using dynamic PCA and canonical variate analysis.

Comparison of Significant Term Extraction Based on the Number of Selected Principal Components (주성분 보유수에 따른 중요 용어 추출의 비교)

  • Lee Chang-Beom;Ock Cheol-Young;Park Hyuk-Ro
    • The KIPS Transactions:PartB
    • /
    • v.13B no.3 s.106
    • /
    • pp.329-336
    • /
    • 2006
  • In this paper, we propose a method of significant term extraction within a document. The technique used is Principal Component Analysis(PCA) which is one of the multivariate analysis methods. PCA can sufficiently use term-term relationships within a document by term-term correlations. We use a correlation matrix instead of a covariance matrix between terms for performing PCA. We also try to find out thresholds of both the number of components to be selected and correlation coefficients between selected components and terms. The experimental results on 283 Korean newspaper articles show that the condition of the first six components with correlation coefficients of |0.4| is the best for extracting sentence based on the significant selected terms.

A Study on the Fuel Economy based on the Driving Patterns for Passenger Car in the Metropolitan Area (승용차 도심 주행패턴에 의한 연비 성능 분석)

  • 정남훈;이우택;선우명호
    • Transactions of the Korean Society of Automotive Engineers
    • /
    • v.11 no.1
    • /
    • pp.25-31
    • /
    • 2003
  • There are a lot of factors influencing on the automobile fuel economy such as average speed, average acceleration, acceleration sum per kilometer, and so on. In this study, various driving data were recorded during road tests. The accumulated road test mileage in Seoul metropolitan area is around 1,300 kilometers. The data were analyzed by multivariate statistical techniques including correlation analysis, principal component analysis, and multiple linear regression analysis. The analyzed results show that the average trip time per kilometer is one of the most important factors to fuel consumption and the increase of the average speed is desirable for reducing emissions and fuel consumption.

Penalized least distance estimator in the multivariate regression model (다변량 선형회귀모형의 벌점화 최소거리추정에 관한 연구)

  • Jungmin Shin;Jongkyeong Kang;Sungwan Bang
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.1
    • /
    • pp.1-12
    • /
    • 2024
  • In many real-world data, multiple response variables are often dependent on the same set of explanatory variables. In particular, if several response variables are correlated with each other, simultaneous estimation considering the correlation between response variables might be more effective way than individual analysis by each response variable. In this multivariate regression analysis, least distance estimator (LDE) can estimate the regression coefficients simultaneously to minimize the distance between each training data and the estimates in a multidimensional Euclidean space. It provides a robustness for the outliers as well. In this paper, we examine the least distance estimation method in multivariate linear regression analysis, and furthermore, we present the penalized least distance estimator (PLDE) for efficient variable selection. The LDE technique applied with the adaptive group LASSO penalty term (AGLDE) is proposed in this study which can reflect the correlation between response variables in the model and can efficiently select variables according to the importance of explanatory variables. The validity of the proposed method was confirmed through simulations and real data analysis.

Geochemistry of the Moisan Epithermal Gold-silver Deposit in Haenam Area (해남 모이산 천열수 금은광상의 지구화학적 특성)

  • Moon, Dong-Hyeok;Koh, Sang-Mo;Lee, Gill-Jae
    • Economic and Environmental Geology
    • /
    • v.43 no.5
    • /
    • pp.491-503
    • /
    • 2010
  • Geochemical characteristics of the Moisan epithermal gold-silver deposit with total 140 samples in Haenam area, Jeollanamdo were studied by using multivariate statistical analysis (correlation analysis, factor analysis and cluster analysis). The correlation analysis reveals that Ag, Cu, Bi, Te are highly correlated with Au in the both non-mineralized and mineralized zone. It is resulted from the presence of Au-Ag bearing minerals (electrum, sylvanite, calaverite and stuezite) and non Au-Ag containing minerals (chalcopyrite, tellurobismuthite and bismuthinite). Mo shows relatively much higher correlation at the mineralized zone (0.615) than non-mineralized zone (0.269) which implies Mo content is strongly affected by Au-mineralization. While Mn, Cs, Fe, Se correlated with Au at the nonmineralized zone, they have negative correlation at the mineralized zone. Therefore, they seem to be eluviated elements from the host rock during gold mineralization. Sb is enriched during the gold mineralization showing high correlation at the mineralized zone and negative correlation at the non-mineralized zone. According to the factor analysis, Se, Ag, Cs, Te are the indicators of gold mineralization presence due to the strong affection of gold content in the non-mineralized zone. In the mineralized zone, on the other hand, Mo, Te and Sb, Cu are the indicators of gold and silver mineralization, respectively. While the cluster analysis reveals that Cd-Zn-Pb-S, Bi-Fe-Cu-Mn, Se-Te-Au-Cs-Ag, As-Sb-Ba are the similar behavior elements groups in the non-mineralized zone, Cd-Zn-Mn-Pb, Fe-S-Se, As-Bi-Cs, Ag-Sb-Cu, Au-Te-Mo are the similar behavior elements groups in the mineralized zone. Using multivariate statistical analysis as mentioned above makes it possible to compare the behavior of presented minerals and difference of geochemical characteristics between mineralized and non-mineralized zone. Therefore, it will be expected a useful tool on the similar type of mining exploration.

A Study on Characteristics of Water Quality using Multivariate Analysis in Sumjin River Basin (다변량 분석법을 이용한 섬진강 수계의 수질 특성 연구)

  • Park, Jinhwan;Moon, Myungjin;Lee, Hyungjin;Kim, Kapsoon
    • Journal of Korean Society on Water Environment
    • /
    • v.30 no.2
    • /
    • pp.119-127
    • /
    • 2014
  • The objective of this study is to evaluate and analyze Sumjin River Basin water environment system. It was necessary to improve the water quality. The data were collected from 2010 January to 2012 December including Water Temperature, pH, DO, EC, $BOD_5$, COD, TOC, SS, T-N, T-P. The data were used to study were required to; Correlation Analysis; Principle Component Analysis; Factor Analysis. The results were as followed. According to correlation analysis on $BOD_5$ against COD, TOC it revealed that the each value of correlation coefficient were 0.715 and 0.719; this means the strength of the relationship is strong. The same analysis on T-P against $BOD_5$, COD, TOC, SS has revealed that the range of the correlation coefficient value fell between 0.482 and 0.608 which means strength of the relationship between them remained normal. Through correlation analysis, it has been found that all elements except T-N have high correlation. The results of principal component analysis to target the overall water quality was extracted to three main components. The cumulative contribution rate is 68.990%. In the case of seasonal water quality, Spring and Summer are extracted to each of four main components. The cumulative contribution rate is 81.515% and 73.550% respectively. Fall and Winter are extracted to each of three main components. The cumulative contribution rate is 65.072% and 72.721% respectively. There is no seasonality in the case of factor analysis. The first common factor is $BOD_5$, COD, TOC, SS, T-P, which were classified. Totally speaking, Sumjin River Basin water system gets highly affected by the nutrient salts, organic matter and suspended solid at the same time.

Agronomic Characteristics of Introduced Triticales

  • Cho, Chang-Hwan;Yun, Seung-Gil;Kazuo, Ataku;Taiki, Yoshihira
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.43 no.1
    • /
    • pp.6-10
    • /
    • 1998
  • This study was conducted to obtain basic information on the development of new triticale cultivars with good quality and high productivity for soiling feed. Twelve cultivars introduced from Poland, Canada and two cultivars developed in Korea were planted in the experimental field at Ansong National University in 1995. Major growth traits and nutrient components for feed were measured and analyzed using principal component analysis and average linkage cluster analysis. 'Prego', 'Prag 46/3', and 'Clercal' were relatively high in forage yield. Most of forage nutrient contents except cellulose were higher in Prego, Clercal, and 'Cumulus' than other cultivars. Results of principal component analysis on 11 traits including forage yield and nutrient contents showed that 72.59% of total variation were explained by the first and second principal components. The Z$_1$ had high correlation with the contents of forage nutrient components and Z$_2$ with plant height, fresh, and dry weight. Fourteen cultivars were classified into 7 groups by multivariate analysis. Clercal and Prego in Group I could be useful source for the improvement of triticale as an important forage crop because they exhibited high productivity as well as high contents of nutrient components for feed.

  • PDF

School Safety Education Factors Predicting Injury Prevalence Among Korean Adolescence (학교의 안전교육 관련 특성이 청소년의 사고발생 예측에 미치는 영향)

  • 이명선;박경옥
    • Korean Journal of Health Education and Promotion
    • /
    • v.21 no.2
    • /
    • pp.147-165
    • /
    • 2004
  • Injury is a leading cause of death in the children and adolescent populations. In particular, more than 80% of unintentional injury was related to risk-taking behaviors involved in diverse accidents around school and home. Therefore, educational approaches should be provided for children and adolescent populations, and schools are the essential and appropriate sites to conduct safety education. This study was conducted to identify injury prevalence and safety education at schools among middle and high school students in Korea. About 1,034 middle and high students in 28 schools participated in a self-administered survey. The target schools were selected from the stratified random sampling method throughout schools of seven metropolitan cities in Korea. The questionnaires were delivered to the vice-principals by ground mailing service and the vice-principals administered survey data collection. The questionnaire asked about safety education provided in schools, injury experience in the last year, needs for injury prevention class in school, and demographics. All survey responses were entered into SPSS worksheet. Multivariate analysis of variance (MANOVA) and descriptive discriminant analysis (DDA) were used in statistical analysis with SPSS software 11.1. Multivariate analysis of variance was conducted as a preliminary analysis of DDA. According to the result of multivariate analysis of variance, gender (man), grade (poor), living with both parents, and displaying injury prevention messages on school news board were significantly different between the injured student group and the uninjured student group (p= .00). These four factors also had significant effects on students' injury experience in DDA, although correlation of the four factors with injury experience was weak overall based on their canonical function coefficients. All structure coefficients of the four factors were greater than .30, which means the four factors have discriminant effects on injury prevalence. The sizes of the discriminant effects, in order, were largly from gender, grade, living with both parents, and safety message display on school news boards.

Assessment and spatial variation of water quality using statistical techniques: Case study of Nakdong river, Korea

  • Kim, Shin
    • Membrane and Water Treatment
    • /
    • v.13 no.5
    • /
    • pp.245-257
    • /
    • 2022
  • Water quality characteristics and their spatial variations in the Nakdong River were statistically analyzed by multivariate techniques including correlation analysis, CA, and FA/PCA based on water quality parameters for 17 sites over 2017-2019, yielding PI values for primary factors. Site 10 indicated the highest parameter concentrations, and results of pearson's correlation analysis suggest that non-biodegradable organic matter had been distributed on the site. Five clusters were identified in order of descending pollution levels: I (Ib > Ia) > II (IIa > IIb) > III. Spatial variations started from sub-cluster Ib in which Daegu city and Geumho-river are joined. T-P, PO4-P, SS, COD, and TOC corresponded to VF 1 and 2, which were found to be principal components with strong influence on water quality. Sub-cluster Ib was strongly influenced by NO3-N and T-N compared to other clusters. According to the PIs, water quality pollution deteriorated due to non-biodegradable organic matter, nitrogen- and phosphorus-based nutrient salts in the middle and lower reaches, illustrating worsening water pollution due to inflows of anthropogenic sources on the Geumho-river, i.e., sewage and wastewater, discharged from Site 10, at which there is a concentration of urban, agricultural, and industrial areas.

Wavelength selection by loading vector analysis in determining total protein in human serum using near-infrared spectroscopy and Partial Least Squares Regression

  • Kim, Yoen-Joo;Yoon, Gil-Won
    • Proceedings of the Korean Society of Near Infrared Spectroscopy Conference
    • /
    • 2001.06a
    • /
    • pp.4102-4102
    • /
    • 2001
  • In multivariate analysis, absorbance spectrum is measured over a band of wavelengths. One does not often pay attention to the size of this wavelength band. However, it is desirable that spectrum is measured at only necessary wavelengths as long as the acceptable accuracy of prediction can be met. In this paper, the method of selecting an optimal band of wavelengths based on the loading vector analysis was proposed and applied for determining total protein in human serum using near-infrared transmission spectroscopy and PLSR. Loading vectors in the full spectrum PLSR were used as reference in selecting wavelengths, but only the first loading vector was used since it explains the spectrum best. Absorbance spectra of sera from 97 outpatients were measured at 1530∼1850 nm with an interval of 2 nm. Total protein concentrations of sera were ranged from 5.1 to 7.7 g/㎗. Spectra were measured by Cary 5E spectrophotometer (Varian, Australia). Serum in the 5 mm-pathlength cuvette was put in the sample beam and air in the reference beam. Full spectrum PLSR was applied to determine total protein from sera. Next, the wavelength region of 1672∼1754 nm was selected based on the first loading vector analysis. Standard Error of Cross Validation (SECV) of full spectrum (1530∼l850 nm) PLSR and selected wavelength PLSR (1672∼1754 nm) was respectively 0.28 and 0.27 g/㎗. The prediction accuracy between the two bands was equal. Wavelength selection based on loading vector in PLSR seemed to be simple and robust in comparison to other methods based on correlation plot, regression vector and genetic algorithm. As a reference of wavelength selection for PLSR, the loading vector has the advantage over the correlation plot since the former is based on multivariate model whereas the latter, on univariate model. Wavelength selection by the first loading vector analysis requires shorter computation time than that by genetic algorithm and needs not smoothing.

  • PDF