• Title/Summary/Keyword: prediction error methods

Search Result 518, Processing Time 0.026 seconds

Compositional Feature Selection and Its Effects on Bandgap Prediction by Machine Learning (기계학습을 이용한 밴드갭 예측과 소재의 조성기반 특성인자의 효과)

  • Chunghee Nam
    • Korean Journal of Materials Research
    • /
    • v.33 no.4
    • /
    • pp.164-174
    • /
    • 2023
  • The bandgap characteristics of semiconductor materials are an important factor when utilizing semiconductor materials for various applications. In this study, based on data provided by AFLOW (Automatic-FLOW for Materials Discovery), the bandgap of a semiconductor material was predicted using only the material's compositional features. The compositional features were generated using the python module of 'Pymatgen' and 'Matminer'. Pearson's correlation coefficients (PCC) between the compositional features were calculated and those with a correlation coefficient value larger than 0.95 were removed in order to avoid overfitting. The bandgap prediction performance was compared using the metrics of R2 score and root-mean-squared error. By predicting the bandgap with randomforest and xgboost as representatives of the ensemble algorithm, it was found that xgboost gave better results after cross-validation and hyper-parameter tuning. To investigate the effect of compositional feature selection on the bandgap prediction of the machine learning model, the prediction performance was studied according to the number of features based on feature importance methods. It was found that there were no significant changes in prediction performance beyond the appropriate feature. Furthermore, artificial neural networks were employed to compare the prediction performance by adjusting the number of features guided by the PCC values, resulting in the best R2 score of 0.811. By comparing and analyzing the bandgap distribution and prediction performance according to the material group containing specific elements (F, N, Yb, Eu, Zn, B, Si, Ge, Fe Al), various information for material design was obtained.

Experimental Study for Modal Parameter Estimation of Structural Systems (구조물의 자유진동특성 추정을 위한 실험적 연구)

  • 윤정방;이형진
    • Proceedings of the Computational Structural Engineering Institute Conference
    • /
    • 1994.10a
    • /
    • pp.175-182
    • /
    • 1994
  • As for the safety evaluation of existing large-scale structures, methods for estimation of the structural and dynamic properties are studied. Sequential prediction error method in time domain and improved FRF estimator in frequency domain are comparatively studied. For this purpose, impact tests of 2 bay 3 floor steel frame structure are performed. Results from both methods are found to be consistent to each others, however those from the finite-element analysis are slightly different from experimental results.

  • PDF

Application of deep learning with bivariate models for genomic prediction of sow lifetime productivity-related traits

  • Joon-Ki Hong;Yong-Min Kim;Eun-Seok Cho;Jae-Bong Lee;Young-Sin Kim;Hee-Bok Park
    • Animal Bioscience
    • /
    • v.37 no.4
    • /
    • pp.622-630
    • /
    • 2024
  • Objective: Pig breeders cannot obtain phenotypic information at the time of selection for sow lifetime productivity (SLP). They would benefit from obtaining genetic information of candidate sows. Genomic data interpreted using deep learning (DL) techniques could contribute to the genetic improvement of SLP to maximize farm profitability because DL models capture nonlinear genetic effects such as dominance and epistasis more efficiently than conventional genomic prediction methods based on linear models. This study aimed to investigate the usefulness of DL for the genomic prediction of two SLP-related traits; lifetime number of litters (LNL) and lifetime pig production (LPP). Methods: Two bivariate DL models, convolutional neural network (CNN) and local convolutional neural network (LCNN), were compared with conventional bivariate linear models (i.e., genomic best linear unbiased prediction, Bayesian ridge regression, Bayes A, and Bayes B). Phenotype and pedigree data were collected from 40,011 sows that had husbandry records. Among these, 3,652 pigs were genotyped using the PorcineSNP60K BeadChip. Results: The best predictive correlation for LNL was obtained with CNN (0.28), followed by LCNN (0.26) and conventional linear models (approximately 0.21). For LPP, the best predictive correlation was also obtained with CNN (0.29), followed by LCNN (0.27) and conventional linear models (approximately 0.25). A similar trend was observed with the mean squared error of prediction for the SLP traits. Conclusion: This study provides an example of a CNN that can outperform against the linear model-based genomic prediction approaches when the nonlinear interaction components are important because LNL and LPP exhibited strong epistatic interaction components. Additionally, our results suggest that applying bivariate DL models could also contribute to the prediction accuracy by utilizing the genetic correlation between LNL and LPP.

Modeling of Co(II) adsorption by artificial bee colony and genetic algorithm

  • Ozturk, Nurcan;Senturk, Hasan Basri;Gundogdu, Ali;Duran, Celal
    • Membrane and Water Treatment
    • /
    • v.9 no.5
    • /
    • pp.363-371
    • /
    • 2018
  • In this work, it was investigated the usability of artificial bee colony (ABC) and genetic algorithm (GA) in modeling adsorption of Co(II) onto drinking water treatment sludge (DWTS). DWTS, obtained as inevitable byproduct at the end of drinking water treatment stages, was used as an adsorbent without any physical or chemical pre-treatment in the adsorption experiments. Firstly, DWTS was characterized employing various analytical procedures such as elemental, FT-IR, SEM-EDS, XRD, XRF and TGA/DTA analysis. Then, adsorption experiments were carried out in a batch system and DWTS's Co(II) removal potential was modelled via ABC and GA methods considering the effects of certain experimental parameters (initial pH, contact time, initial Co(II) concentration, DWTS dosage) called as the input parameters. The accuracy of ABC and GA method was determined and these methods were applied to four different functions: quadratic, exponential, linear and power. Some statistical indices (sum square error, root mean square error, mean absolute error, average relative error, and determination coefficient) were used to evaluate the performance of these models. The ABC and GA method with quadratic forms obtained better prediction. As a result, it was shown ABC and GA can be used optimization of the regression function coefficients in modeling adsorption experiments.

Reliability Computation of Neuro-Fuzzy Models : A Comparative Study (뉴로-퍼지 모델의 신뢰도 계산 : 비교 연구)

  • 심현정;박래정;왕보현
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.11 no.4
    • /
    • pp.293-301
    • /
    • 2001
  • This paper reviews three methods to compute a pointwise confidence interval of neuro-fuzzy models and compares their estimation perfonnanee through simulations. The eOITl.putation methods under consideration include stacked generalization using cross-validation, predictive error bar in regressive models, and local reliability measure for the networks employing a local representation scheme. These methods implemented on the neuro-fuzzy models are applied to the problems of simple function approximation and chaotic time series prediction. The results of reliability estimation are compared both quantitatively and qualitatively.

  • PDF

Robustness of model averaging methods for the violation of standard linear regression assumptions

  • Lee, Yongsu;Song, Juwon
    • Communications for Statistical Applications and Methods
    • /
    • v.28 no.2
    • /
    • pp.189-204
    • /
    • 2021
  • In a regression analysis, a single best model is usually selected among several candidate models. However, it is often useful to combine several candidate models to achieve better performance, especially, in the prediction viewpoint. Model combining methods such as stacking and Bayesian model averaging (BMA) have been suggested from the perspective of averaging candidate models. When the candidate models include a true model, it is expected that BMA generally gives better performance than stacking. On the other hand, when candidate models do not include the true model, it is known that stacking outperforms BMA. Since stacking and BMA approaches have different properties, it is difficult to determine which method is more appropriate under other situations. In particular, it is not easy to find research papers that compare stacking and BMA when regression model assumptions are violated. Therefore, in the paper, we compare the performance among model averaging methods as well as a single best model in the linear regression analysis when standard linear regression assumptions are violated. Simulations were conducted to compare model averaging methods with the linear regression when data include outliers and data do not include them. We also compared them when data include errors from a non-normal distribution. The model averaging methods were applied to the water pollution data, which have a strong multicollinearity among variables. Simulation studies showed that the stacking method tends to give better performance than BMA or standard linear regression analysis (including the stepwise selection method) in the sense of risks (see (3.1)) or prediction error (see (3.2)) when typical linear regression assumptions are violated.

Development of Models for the Prediction of Domestic Red Pepper (Capsicum annuum L.) Powder Capsaicinoid Content using Visible and Near-infrared Spectroscopy

  • Lim, Jongguk;Mo, Changyeun;Kim, Giyoung;Kim, Moon S.;Lee, Hoyoung
    • Journal of Biosystems Engineering
    • /
    • v.40 no.1
    • /
    • pp.47-60
    • /
    • 2015
  • Purpose: The purpose of this study was to non-destructively and quickly predict the capsaicinoid content of domestic red pepper powders from various areas of Korea using a pungency measurement system in combination with visible and near-infrared (VNIR) spectroscopic techniques. Methods: The reflectance spectra of 149 red pepper powder samples from 14 areas of Korea were obtained in the wavelength range of 450-950 nm and partial least squares regression (PLSR) models for the prediction of capsaicinoid content were developed using area models. Results: The determination coefficient of validation (RV2), standard error of prediction (SEP), and residual prediction deviation (RPD) for the capsaicinoid content prediction model for the Namyoungyang area were 0.985, ${\pm}2.17mg/100g$, and 7.94, respectively. Conclusions: These results show the possibility of VNIR spectroscopy combined with PLSR models in the non-destructive and facile prediction of capsaicinoid content of red pepper powders from Korea.

Groundwater Level Prediction Using ANFIS Algorithm (ANFIS 알고리즘을 이용한 지하수수위 예측)

  • Bak, Gwi-Man;Bae, Young-Chul
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.14 no.6
    • /
    • pp.1235-1240
    • /
    • 2019
  • It is well known that the ground water level changes rapidly before and after the earthquake, and the variation of ground water level prediction is used to predict the earthquake. In this paper, we predict the ground water level in Miryang City using ANFIS algorithm for earthquake prediction. For this purpose, this paper used precipitation and temperature acquired from National Weather Service and data of underground water level from Rural Groundwater Observation Network of Korea Rural Community Corporation which is installed in Miryang city, Gyeongsangnam-do. We measure the prediction accuracy using RMSE and MAPE calculation methods. As a result of the prediction, the periodic pattern was predicted by natural factors, but the change value of ground water level was changed by other variables such as artificial factors that was not detected. To solve this problem, it is necessary to digitize the ground water level by numerically quantifying artificial variables, and to measure the precipitation and pressure according to the exact location of the observation ball measuring the ground water level.

Prediction of Heavy Metal Content in Compost Using Near-infrared Reflectance Spectroscopy

  • Ko, H.J.;Choi, H.L.;Park, H.S.;Lee, H.W.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.17 no.12
    • /
    • pp.1736-1740
    • /
    • 2004
  • Since the application of relatively high levels of heavy metals in the compost poses a potential hazard to plants and animals, the content of heavy metals in the compost with animal manure is important to know if it is as a fertilizer. Measurement of heavy metals content in the compost by chemical methods usually requires numerous reagents, skilled labor and expensive analytical equipment. The objective of this study, therefore, was to explore the application of near-infrared reflectance spectroscopy (NIRS), a nondestructive, cost-effective and rapid method, for the prediction of heavy metals contents in compost. One hundred and seventy two diverse compost samples were collected from forty-seven compost facilities located along the Han river in Korea, and were analyzed for Cr, As, Cd, Cu, Zn and Pb levels using inductively coupled plasma spectrometry. The samples were scanned using a Foss NIRSystem Model 6500 scanning monochromator from 400 to 2,500 nm at 2 nm intervals. The modified partial least squares (MPLS), the partial least squares (PLS) and the principal component regression (PCR) analysis were applied to develop the most reliable calibration model, between the NIR spectral data and the sample sets for calibration. The best fit calibration model for measurement of heavy metals content in compost, MPLS, was used to validate calibration equations with a similar sample set (n=30). Coefficient of simple correlation (r) and standard error of prediction (SEP) were Cr (0.82, 3.13 ppm), As (0.71, 3.74 ppm), Cd (0.76, 0.26 ppm), Cu (0.88, 26.47 ppm), Zn (0.84, 52.84 ppm) and Pb (0.60, 2.85 ppm), respectively. This study showed that NIRS is a feasible analytical method for prediction of heavy metals contents in compost.

Application of Informer for time-series NO2 prediction

  • Hye Yeon Sin;Minchul Kang;Joonsung Kang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.7
    • /
    • pp.11-18
    • /
    • 2023
  • In this paper, we evaluate deep learning time series forecasting models. Recent studies show that those models perform better than the traditional prediction model such as ARIMA. Among them, recurrent neural networks to store previous information in the hidden layer are one of the prediction models. In order to solve the gradient vanishing problem in the network, LSTM is used with small memory inside the recurrent neural network along with BI-LSTM in which the hidden layer is added in the reverse direction of the data flow. In this paper, we compared the performance of Informer by comparing with other models (LSTM, BI-LSTM, and Transformer) for real Nitrogen dioxide (NO2) data. In order to evaluate the accuracy of each method, mean square root error and mean absolute error between the real value and the predicted value were obtained. Consequently, Informer has improved prediction accuracy compared with other methods.