• Title/Summary/Keyword: Multiple Linear Regression(MLR)

Search Result 124, Processing Time 0.018 seconds

Prediction of Gas Chromatographic Retention Times of PAH Using QSRR (기체크로마토그래피에서 QSRR을 통한 PAH 용리시간 예측)

  • Kim, Young Gu
    • Journal of the Korean Chemical Society
    • /
    • v.45 no.5
    • /
    • pp.422-428
    • /
    • 2001
  • Retention relative times(RRTs) of PAH molecules and their derivatives in gas chromatography are trained and predicted in testing sets using a multiple linear regression(MLR) and an artificial neural network(ANN). The main descriptors of PAHs and their derivatives in QSRR are the square root of molecular weight(sqmw), molecular connectivity($^1{\chi}_v$), molecular dipole moment(D) and length-to-breadth ratios(L/B). The results of MLR shows that a heavy molecule has a propensity for long retention time. L/B closely related with slot model is a good descriptor in MLR. On the other hand, ANN which is not effected by the linear dependencies among the descriptors were exclusively based on molecular weight and molecular dipole moment. The variances which shows the accuracy of prediction for retention times in testing sets are 1.860, 0.206 for MLR and ANN, respectively. It was shown that ANN can exceed the MLR in prediction accuracy.

  • PDF

Chemical Oxygen Demand (COD) Model for the Assessment of Water Quality in the Han River, Korea (한강수질 평가를 위한 COD (화학적 산소 요구량) 모델 평가)

  • Kim, Jae Hyoun;Jo, Jinnam
    • Journal of Environmental Health Sciences
    • /
    • v.42 no.4
    • /
    • pp.280-292
    • /
    • 2016
  • Objectives: The objective of this study was to build COD regression models for the Han River and evaluate water quality. Methods: Water quality data sets for the dry season (as of January) during a four-year period (2012-2015) were collected from the database of the Han River automatic water quality monitoring stations. Statistical techniques, including combined genetic algorithm-multiple linear regression (GA-MLR) were used to build five-descriptor COD models. Multivariate statistical techniques such as principal component analysis (PCA) and cluster analysis (CA) are useful tools for extracting meaningful information. Results: The $r^2$ of the best COD models provided significant high values (> 0.8) between 2012 and 2015. Total organic carbon (TOC) was a surrogate indicator for COD (as COD/TOC) with high reliability ($r^2=0.63$ in 2012, $r^2=0.75$ for 2013, $r^2=0.79$ for 2014 and $r^2=0.85$ for 2015). The ratios of COD/TOC were calculated as 2.08 in 2012, 1.79 in 2013, 1.52 and 1.45 in 2015, indicating that biodegradability in the water body of the Han River was being sustained, thereby further improving water quality. The BOD/COD ratio supported these findings. The cluster analysis revealed higher annual levels of microorganisms and phosphorous at stations along the Hangang-Seoul and Hantangang areas. Nevertheless, the overall water quality over the last four years showed an observable trend toward continuous improvement. These findings also suggest that non-point pollution control strategies should consider the influence of upstreams and downstreams to protect water quality in the Han River. Conclusion: This data analysis procedure provided an efficient and comprehensive tool to interpret complex water quality data matrices. Results from a trend analysis provided much important information about sources and parameters for Han River water quality management.

Quantitative Structure Activity Relationship Prediction of Oral Bioavailabilities Using Support Vector Machine

  • Fatemi, Mohammad Hossein;Fadaei, Fatemeh
    • Journal of the Korean Chemical Society
    • /
    • v.58 no.6
    • /
    • pp.543-552
    • /
    • 2014
  • A quantitative structure activity relationship (QSAR) study is performed for modeling and prediction of oral bioavailabilities of 216 diverse set of drugs. After calculation and screening of molecular descriptors, linear and nonlinear models were developed by using multiple linear regression (MLR), artificial neural network (ANN), support vector machine (SVM) and random forest (RF) techniques. Comparison between statistical parameters of these models indicates the suitability of SVM over other models. The root mean square errors of SVM model were 5.933 and 4.934 for training and test sets, respectively. Robustness and reliability of the developed SVM model was evaluated by performing of leave many out cross validation test, which produces the statistic of $Q^2_{SVM}=0.603$ and SPRESS = 7.902. Moreover, the chemical applicability domains of model were determined via leverage approach. The results of this study revealed the applicability of QSAR approach by using SVM in prediction of oral bioavailability of drugs.

Estimation of LOADEST coefficients according to watershed characteristics (유역특성에 따른 LOADEST 회귀모형 매개변수 추정)

  • Kim, Kyeung;Kang, Moon Seong;Song, Jung Hun;Park, Jihoon
    • Journal of Korea Water Resources Association
    • /
    • v.51 no.2
    • /
    • pp.151-163
    • /
    • 2018
  • The objective of this study was to estimate LOADEST (LOAD Estimator) coefficients for simulating pollutant loads in ungauged watersheds. Regression models of LOADEST were used to simulate pollutant loads, and the multiple linear regression (MLR) was used for coefficients estimation on watershed characteristics. The fifth and third model of LOADEST were selected to simulate T-N (Total-Nitrogen) and T-P (Total-Phosphorous) loads, respectively. The results and statistics indicated that regression models based on LOADEST simulated pollutant loads reasonably and model coefficients were reliable. However, the results also indicated that LOADEST underestimated pollutant loads and had a bias. For this reason, simulated loads were corrected the bias by a quantile mapping method in this study. Corrected loads indicated that the bias correction was effective. Using multiple regression analysis, a coefficient estimation methods according to the watershed characteristic were developed. Coefficients which calculated by MLR were used in models. The simulated result and statistics indicated that MLR estimated the model coefficients reasonably. Regression models developed in this study would help simulate pollutant loads for ungauged watersheds and be a screen model for policy decision.

Quantitative Structure-Activity Relationship(QSAR) Study of New Fluorovinyloxycetamides

  • Jo, Du Ho;Lee, Seong Gwang;Kim, Beom Tae;No, Gyeong Tae
    • Bulletin of the Korean Chemical Society
    • /
    • v.22 no.4
    • /
    • pp.388-394
    • /
    • 2001
  • Quantitative Structure-Activity Relationship (QSAR) have been established of 57 fluorovinyloxyacetamides compounds to correlate and predict EC50 values. Genetic algorithm (GA) and multiple linear regression analysis were used to select the descriptors and to generate the equations that relate the structural features to the biological activities. This equation consists of three descriptors calculated from the molecular structures with molecular mechanics and quantum-chemical methods. The results of MLR and GA show that dipole moment of z-axis, radius of gyration and logP play an important role in growth inhibition of barnyard grass.

Nonlinear QSAR Study of Xanthone and Curcuminoid Derivatives as α-Glucosidase Inhibitors

  • Saihi, Youcef;Kraim, Khairedine;Ferkous, Fouad;Djeghaba, Zeineddine;Azzouzi, Abdelkader;Benouis, Sabrina
    • Bulletin of the Korean Chemical Society
    • /
    • v.34 no.6
    • /
    • pp.1643-1650
    • /
    • 2013
  • A non linear QSAR model was constructed on a series of 57 xanthone and curcuminoide derivatives as ${\alpha}$-glucosidase inhibitors by back-propagation neural network method. The neural network architecture was optimized to obtain a three-layer neural network, composed of five descriptors, nine hidden neurons and one output neuron. A good predictive determination coefficient was obtained (${R^2}_{Pset}$ = 86.7%), the statistical results being better than those obtained with the same data set using a multiple regression analysis (MLR). As in the MLR model, the descriptor MATS7v weighted by Van der Waals volume was found as the most important independent variable on the ${\alpha}$-glucosidase inhibitory.

Quantitative Structure-Activity Relationships for Radical Scavenging Activities of Flavonoid Compounds by GA-MLR Technique

  • Om, Ae-Son;Ryu, Jae-Chun;Kim, Jae-Hyoun
    • Molecular & Cellular Toxicology
    • /
    • v.4 no.2
    • /
    • pp.170-176
    • /
    • 2008
  • The quantitative structure-activity relationship (QSAR) of a set of 35 flavonoid compounds presenting antioxidant activity was established by means of Genetic Algorithm-Multiple Linear Regression (GA-MLR) technique. Four-parametric models for two sets of data, the 1,1-diphenyl-2-picryl hydrazyl (DPPH) radical scavenging activity $(R^2=0.788,\;Q^2_{cv}=0.699\;and\;Q^2_{ext}=0.577)$ and scavenging activity of reactive oxgen species (ROS) induced by $H_2O_2 (R^=0.829,\;Q^2_{cv}=0.754\;and\;Q^2_{ext}=0.573)$ were obtained with low external predictive ability on a mass basis, respectively. Each model gave some different mechanistic aspects of the flavonoid compounds tested in terms of the radical scavenging activity. Topological charge, H-bonding complex and deprotonation processes were likely to be involved in the radical scavenging activity.

Near Infrared Reflectance Spectroscopy for Non-Invasive Measuring of Internal Quality of Apple Fruit

  • Sohn, Mi-Ryeong;Park, Woo-Churl;Cho, Rae-Kwang
    • Near Infrared Analysis
    • /
    • v.1 no.1
    • /
    • pp.27-30
    • /
    • 2000
  • In this study, we investigated the feasibility of non-destructive determination of internal quality factors of Fuji apple fruit using near infrared(NIR) reflectance spectroscopy and developed the calibration models. As the reference methods, refractometer, titration and texture analyzer for sugar content, acidity and firmness were used, respectively. Samples were scanned from 1100∼2500nm with InfraAlyzer 500C spectrometer and SESAME software was used for data analysis. A multiple linear regression(MLR) analysis was performed to develop the calibrations. The correlation coefficient(R) and standard error of prediction(SEP) were as follows; 0.91, 0.41$^{\circ}$Brix for sugar content, 0.90, 0.04% for acidity and 0.84, 0.094 kg for firmness, respectively. This study shows that NIR spectroscopy can be used to evaluate the sugar content acidity and firmness of apple fruit with acceptable accuracy.

Comparison of Performance of Models to Predict Hardness of Tomato using Spectroscopic Data of Reflectance and Transmittance (토마토 반사광과 투과광 스펙트럼 분석에 의한 경도 예측 성능 비교)

  • Kim, Young-Tae;Suh, Sang-Ryong
    • Journal of Biosystems Engineering
    • /
    • v.33 no.1
    • /
    • pp.63-68
    • /
    • 2008
  • This study was carried out to find a useful method to predict hardness of tomato using optical spectrum data. Optical spectrum of reflectance and transmittance data were collected processed by 9 kind of preprocessing methods-normalizations of mean, maximum and range, SNV (standard normal variate), MSC (multiplicative scatter correction), the first derivative and second derivative of Savitzky-Golay and Norris-Gap. With the preprocessed and non-processed original spectrum data, prediction models of hardness of tomato were developed using analytical tools of PLS (partial least squares) and MLR (multiple linear regression) and tested for their validation. The test of validation resulted that the analytical tools of PLS and MLR output similar performances while the transmittance spectra showed much better result than the reflectance spectra.

Quantum Chemical Studies of Some Sulphanilamide Schiff Bases Inhibitor Activity Using QSAR Methods

  • Baher, Elham;Darzi, Naser;Morsali, Ali;Beyramabadi, Safar Ali
    • Journal of the Korean Chemical Society
    • /
    • v.59 no.6
    • /
    • pp.483-487
    • /
    • 2015
  • The different calculated quantum chemical descriptors by DFT method were used for prediction of some sulphanilamide Schiff bases inhibitor activity as a binding constant (log K). Multiple linear regression (MLR) and artificial neural network (ANN) were employed for developing the useful quantitative structure activity relationship (QSAR) model. The obtained results presented superiority of ANN model over the MLR one. The offering QSAR model is very easy to computation and Physico-Chemically interpretable. Sensitivity analysis was used to determine the relative importance of each descriptor in ANN model. The order of importance of each descriptor according to this analysis is: molecular volume, molecular weight and dipole moment, respectively. These descriptors appear good information related to different structure of sulphanilamide Schiff bases can participate in their inhibitor activity.