• Title/Summary/Keyword: principal component regression

Search Result 251, Processing Time 0.024 seconds

Affecting Factors on the Variation of Atmospheric Concentration of Polycyclic Aromatic Hydrocarbons in Central London

  • Baek, Sung-Ok;Roger Perry
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.10 no.E
    • /
    • pp.343-356
    • /
    • 1994
  • In this study, a statistical investigation was carried out for the evaluation of any relationship between polycyclic aromatic hydrocarbons (PAHss) associated with ambient aerosols and other air quality parameters under varying meteorological conditions. Daily measurements for PAHs and air quality/meteorological parameters were selected from a data-base constructed by a comprehensive air monitoring in London during 1985-1987. Correlation coefficients were calculated to examine any significant relationship between the PAHs and other individual variables. Statistical analysis was further Performed for the air quality/meteorological data set using a principal component analysis to derive important factors inherent in the interactions among the variables. A total of six components were identified, representing vehicle emission, photochemical activity/volatilization, space heating, atmospheric humidity, atmospheric stability, and wet deposition. It was found from a stepwise multiple regression analysis that the vehicle emission component is overall the most important factor contributing to the variability of PAHs concentrations at the monitoring site. The photochemical activity/volatilzation component appeared to be also an important factor particularly for the lower molecular weight PAHs. In general, the space heating component was found to be next important factor, while the contributions of other three components to the variance of each PAHs did not appear to be as much important as the first three components in most cases. However, a consistency for these components in their negative correlations with PAHs data was found, indicating their roles in the depletion of PAHs concentrations in the urban atmosphere.

  • PDF

A Method for Screening Product Design Variables for Building A Usability Model : Genetic Algorithm Approach (사용편의성 모델수립을 위한 제품 설계 변수의 선별방법 : 유전자 알고리즘 접근방법)

  • Yang, Hui-Cheol;Han, Seong-Ho
    • Journal of the Ergonomics Society of Korea
    • /
    • v.20 no.1
    • /
    • pp.45-62
    • /
    • 2001
  • This study suggests a genetic algorithm-based partial least squares (GA-based PLS) method to select the design variables for building a usability model. The GA-based PLS uses a genetic algorithm to minimize the root-mean-squared error of a partial least square regression model. A multiple linear regression method is applied to build a usability model that contains the variables seleded by the GA-based PLS. The performance of the usability model turned out to be generally better than that of the previous usability models using other variable selection methods such as expert rating, principal component analysis, cluster analysis, and partial least squares. Furthermore, the model performance was drastically improved by supplementing the category type variables selected by the GA-based PLS in the usability model. It is recommended that the GA-based PLS be applied to the variable selection for developing a usability model.

  • PDF

Regression Model With High Reliability by Using Neural Networks (신경망을 이용한 고신뢰성의 회귀분석 모델)

  • Jo, Yong-Hyeon
    • The KIPS Transactions:PartB
    • /
    • v.8B no.4
    • /
    • pp.327-334
    • /
    • 2001
  • 본 논문에서는 기울기하강과 동적터널링이 조합된 학습알고리즘의 다층신경망을 이용한 고신회성의 회귀분석 모델을 제안하였다. 기울기하강은 빠른 수렴속도의 최적화가 가능하도록 하기 위함이고, 동적터널링은 국소최적해를 만났을 때 이를 벗어난 새로운 연결가중치를 설정하여 전역최적해로 수렴되도록 하기 위함이다. 또한 대용량의 입력 데이터를 통계적으로 독립인 특징들의 집합으로 변환시키는 주요성분분석 기법의 속성을 살려 학습데이터의 차원을 감소시킴으로서 고차원의 학습데이터에 따른 회귀분석 모델의 제약도 동시에 해결하였다. 제안된 기법의 신경망을 3개의 독립변수 패턴을 가진 암모니아 제조공정문제와 10개의 독립변수 패턴을 가진 자동차 연비문제에 각각 적용하여 시뮬레이션한 결과, 기존의 역전과 알고리즘의 신경망이나 주요성분분석에 의한 차원을 감소시키지 않은 학습패턴을 이용한 신경망보다 각각 더욱 우수한 학습성능과 회귀성능이 있음을 확인할 수 있었다. 또한 학습패턴의 영평균 정규화로 회귀용 신경망의 성능을 더욱 더 개선하였다.

  • PDF

A Generation and Accuracy Evaluation of Common Metadata Prediction Model Using Public Bicycle Data and Imputation Method

  • Kim, Jong-Chan;Jung, Se-Hoon
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.2
    • /
    • pp.287-296
    • /
    • 2022
  • Today, air pollution is becoming a severe issue worldwide and various policies are being implemented to solve environmental pollution. In major cities, public bicycles are installed and operated to reduce pollution and solve transportation problems, and operational information is collected in real time. However, research using public bicycle operation information data has not been processed. This study uses the daily weather data of Korea Meteorological Agency and real-time air pollution data of Korea Environment Corporation to predict the amount of daily rental bicycles. Cross- validation, principal component analysis and multiple regression analysis were used to determine the independent variables of the predictive model. Then, the study selected the elements that satisfy the significance level, constructed a model, predicted the amount of daily rental bicycles, and measured the accuracy.

Discrimination of Cultivars and Cultivation Origins from the Sepals of Dry Persimmon Using FT-IR Spectroscopy Combined with Multivariate Analysis (FT-IR 스펙트럼 데이터의 다변량 통계분석을 이용한 곶감의 원산지 및 품종 식별)

  • Hur, Suel Hye;Kim, Suk Weon;Min, Byung Whan
    • Korean Journal of Food Science and Technology
    • /
    • v.47 no.1
    • /
    • pp.20-26
    • /
    • 2015
  • This study aimed to establish a rapid system for discriminating the cultivation origins and cultivars of dry persimmons, using metabolite fingerprinting by Fourier transform infrared (FT-IR) spectroscopy combined with multivariate analysis. Whole-cell extracts from the sepals of four Korean cultivars and two different Chinese dry persimmons were subjected to FT-IR spectroscopy. Principle component analysis (PCA) and partial least squares discriminant analysis (PLS-DA) of the FT-IR spectral data successfully discriminated six dry persimmons into two groups depending on their cultivation origins. Principal component loading values showed that the 1750-1420 and $1190-950cm^{-1}$ regions of the FT-IR spectra were significantly important for the discrimination of cultivation origins. The accuracy of prediction of the cultivation origins and cultivars by PLS regression was 100% (p<0.01) and 85.9% (p<0.05), respectively. These results clearly show that metabolic fingerprinting of FT-IR spectra can be applied for rapid discrimination of the cultivation origins and cultivars of commercial dry persimmons.

The Reduction or computation in MLLR Framework using PCA or ICA for Speaker Adaptation (화자적응에서 PCA 또는 ICA를 이용한 MLLR알고리즘 연산량 감소)

  • 김지운;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.6
    • /
    • pp.452-456
    • /
    • 2003
  • We discuss how to reduce the number of inverse matrix and its dimensions requested in MLLR framework for speaker adaptation. To find a smaller set of variables with less redundancy, we adapt PCA (principal component analysis) and ICA (independent component analysis) that would give as good a representation as possible. The amount of additional computation when PCA or ICA is applied is as small as it can be disregarded. 10 components for ICA and 12 components for PCA represent similar performance with 36 components for ordinary MLLR framework. If dimension of SI model parameter is n, the amount of computation of inverse matrix in MLLR is proportioned to O(n⁴). So, compared with ordinary MLLR, the amount of total computation requested in speaker adaptation is reduced by about 1/81 in MLLR with PCA and 1/167 in MLLR with ICA.

Asymptotic Test for Dimensionality in Sliced Inverse Regression (분할 역회귀모형에서 차원결정을 위한 점근검정법)

  • Park, Chang-Sun;Kwak, Jae-Guen
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.2
    • /
    • pp.381-393
    • /
    • 2005
  • As a promising technique for dimension reduction in regression analysis, Sliced Inverse Regression (SIR) and an associated chi-square test for dimensionality were introduced by Li (1991). However, Li's test needs assumption of Normality for predictors and found to be heavily dependent on the number of slices. We will provide a unified asymptotic test for determining the dimensionality of the SIR model which is based on the probabilistic principal component analysis and free of normality assumption on predictors. Illustrative results with simulated and real examples will also be provided.

A gradient boosting regression based approach for energy consumption prediction in buildings

  • Bataineh, Ali S. Al
    • Advances in Energy Research
    • /
    • v.6 no.2
    • /
    • pp.91-101
    • /
    • 2019
  • This paper proposes an efficient data-driven approach to build models for predicting energy consumption in buildings. Data used in this research is collected by installing humidity and temperature sensors at different locations in a building. In addition to this, weather data from nearby weather station is also included in the dataset to study the impact of weather conditions on energy consumption. One of the main emphasize of this research is to make feature selection independent of domain knowledge. Therefore, to extract useful features from data, two different approaches are tested: one is feature selection through principal component analysis and second is relative importance-based feature selection in original domain. The regression model used in this research is gradient boosting regression and its optimal parameters are chosen through a two staged coarse-fine search approach. In order to evaluate the performance of model, different performance evaluation metrics like r2-score and root mean squared error are used. Results have shown that best performance is achieved, when relative importance-based feature selection is used with gradient boosting regressor. Results of proposed technique has also outperformed the results of support vector machines and neural network-based approaches tested on the same dataset.

Assessment through Statistical Methods of Water Quality Parameters(WQPs) in the Han River in Korea

  • Kim, Jae Hyoun
    • Journal of Environmental Health Sciences
    • /
    • v.41 no.2
    • /
    • pp.90-101
    • /
    • 2015
  • Objective: This study was conducted to develop a chemical oxygen demand (COD) regression model using water quality monitoring data (January, 2014) obtained from the Han River auto-monitoring stations. Methods: Surface water quality data at 198 sampling stations along the six major areas were assembled and analyzed to determine the spatial distribution and clustering of monitoring stations based on 18 WQPs and regression modeling using selected parameters. Statistical techniques, including combined genetic algorithm-multiple linear regression (GA-MLR), cluster analysis (CA) and principal component analysis (PCA) were used to build a COD model using water quality data. Results: A best GA-MLR model facilitated computing the WQPs for a 5-descriptor COD model with satisfactory statistical results ($r^2=92.64$,$Q{^2}_{LOO}=91.45$,$Q{^2}_{Ext}=88.17$). This approach includes variable selection of the WQPs in order to find the most important factors affecting water quality. Additionally, ordination techniques like PCA and CA were used to classify monitoring stations. The biplot based on the first two principal components (PCs) of the PCA model identified three distinct groups of stations, but also differs with respect to the correlation with WQPs, which enables better interpretation of the water quality characteristics at particular stations as of January 2014. Conclusion: This data analysis procedure appears to provide an efficient means of modelling water quality by interpreting and defining its most essential variables, such as TOC and BOD. The water parameters selected in a COD model as most important in contributing to environmental health and water pollution can be utilized for the application of water quality management strategies. At present, the river is under threat of anthropogenic disturbances during festival periods, especially at upstream areas.

Factors Contributing to Winning in Ice Hockey: Analysis of 2017 Ice Hockey World Championship (2017 International Ice Hockey Federation World Championship의 승리 결정요인 분석)

  • Lee, Jusung;Kim, Hyeyoung;Kim, Chaeeun;Pathak, Prabhat;Moon, Jeheon
    • 한국체육학회지인문사회과학편
    • /
    • v.57 no.4
    • /
    • pp.387-394
    • /
    • 2018
  • The purpose of this study is to provide information regarding the strategies by identifying the main variables that determines the winning team based on the records of all games of the 2017 IIHF World Championship Top league. 64 matches were analyzed for the study. 6 variables were analyzed which included ratio of saves, shots on goal, penalties in minutes, time for power play, power play goals, and face off wins. Logistic regression analysis (LRA), multiple regression analysis (MRA), and principal component analysis (PCA) were implemented to examine the relationship between win and loss. In case of LRA, shots on goal (p<.001), face-off wins (p<.001) had significantly positive relation to winning of game whereas, penalties in minutes (p<.01) and time on power play (p<.01) had significantly negative. Using MRA, win percentage was calculated which had significant positive correlation to ratio of saves (p<.01) and face-off wins (p<.001) whereas, a significant negative with penalties in minutes (p<.001). For PCA, the winning team consisted of penalty, attack, and defense factors whereas, losing teams consisted only the attack and defense factors.