• 제목/요약/키워드: Training and Prediction

검색결과 953건 처리시간 0.032초

The Prediction Ability of Genomic Selection in the Wheat Core Collection

  • Yuna Kang;Changsoo Kim
    • 한국작물학회:학술대회논문집
    • /
    • 한국작물학회 2022년도 추계학술대회
    • /
    • pp.235-235
    • /
    • 2022
  • Genome selection is a promising tool for plant and animal breeding, which uses genome-wide molecular marker data to capture large and small effect quantitative trait loci and predict the genetic value of selection candidates. Genomic selection has been shown previously to have higher prediction accuracies than conventional marker-assisted selection (MAS) for quantitative traits. In this study, the prediction accuracy of 10 agricultural traits in the wheat core group with 567 points was compared. We used a cross-validation approach to train and validate prediction accuracy to evaluate the effects of training population size and training model.As for the prediction accuracy according to the model, the prediction accuracy of 0.4 or more was evaluated except for the SVN model among the 6 models (GBLUP, LASSO, BayseA, RKHS, SVN, RF) used in most all traits. For traits such as days to heading and days to maturity, the prediction accuracy was very high, over 0.8. As for the prediction accuracy according to the training group, the prediction accuracy increased as the number of training groups increased in all traits. It was confirmed that the prediction accuracy was different in the training population according to the genetic composition regardless of the number. All training models were verified through 5-fold cross-validation. To verify the prediction ability of the training population of the wheat core collection, we compared the actual phenotype and genomic estimated breeding value using 35 breeding population. In fact, out of 10 individuals with the fastest days to heading, 5 individuals were selected through genomic selection, and 6 individuals were selected through genomic selection out of the 10 individuals with the slowest days to heading. Therefore, we confirmed the possibility of selecting individuals according to traits with only the genotype for a shorter period of time through genomic selection.

  • PDF

Comparison and optimization of deep learning-based radiosensitivity prediction models using gene expression profiling in National Cancer Institute-60 cancer cell line

  • Kim, Euidam;Chung, Yoonsun
    • Nuclear Engineering and Technology
    • /
    • 제54권8호
    • /
    • pp.3027-3033
    • /
    • 2022
  • Background: In this study, various types of deep-learning models for predicting in vitro radiosensitivity from gene-expression profiling were compared. Methods: The clonogenic surviving fractions at 2 Gy from previous publications and microarray gene-expression data from the National Cancer Institute-60 cell lines were used to measure the radiosensitivity. Seven different prediction models including three distinct multi-layered perceptrons (MLP), four different convolutional neural networks (CNN) were compared. Folded cross-validation was applied to train and evaluate model performance. The criteria for correct prediction were absolute error < 0.02 or relative error < 10%. The models were compared in terms of prediction accuracy, training time per epoch, training fluctuations, and required calculation resources. Results: The strength of MLP-based models was their fast initial convergence and short training time per epoch. They represented significantly different prediction accuracy depending on the model configuration. The CNN-based models showed relatively high prediction accuracy, low training fluctuations, and a relatively small increase in the memory requirement as the model deepens. Conclusion: Our findings suggest that a CNN-based model with moderate depth would be appropriate when the prediction accuracy is important, and a shallow MLP-based model can be recommended when either the training resources or time are limited.

Semi-supervised Software Defect Prediction Model Based on Tri-training

  • Meng, Fanqi;Cheng, Wenying;Wang, Jingdong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권11호
    • /
    • pp.4028-4042
    • /
    • 2021
  • Aiming at the problem of software defect prediction difficulty caused by insufficient software defect marker samples and unbalanced classification, a semi-supervised software defect prediction model based on a tri-training algorithm was proposed by combining feature normalization, over-sampling technology, and a Tri-training algorithm. First, the feature normalization method is used to smooth the feature data to eliminate the influence of too large or too small feature values on the model's classification performance. Secondly, the oversampling method is used to expand and sample the data, which solves the unbalanced classification of labelled samples. Finally, the Tri-training algorithm performs machine learning on the training samples and establishes a defect prediction model. The novelty of this model is that it can effectively combine feature normalization, oversampling techniques, and the Tri-training algorithm to solve both the under-labelled sample and class imbalance problems. Simulation experiments using the NASA software defect prediction dataset show that the proposed method outperforms four existing supervised and semi-supervised learning in terms of Precision, Recall, and F-Measure values.

단시간 다중모델 앙상블 바람 예측 (Wind Prediction with a Short-range Multi-Model Ensemble System)

  • 윤지원;이용희;이희춘;하종철;이희상;장동언
    • 대기
    • /
    • 제17권4호
    • /
    • pp.327-337
    • /
    • 2007
  • In this study, we examined the new ensemble training approach to reduce the systematic error and improve prediction skill of wind by using the Short-range Ensemble prediction system (SENSE), which is the mesoscale multi-model ensemble prediction system. The SENSE has 16 ensemble members based on the MM5, WRF ARW, and WRF NMM. We evaluated the skill of surface wind prediction compared with AWS (Automatic Weather Station) observation during the summer season (June - August, 2006). At first stage, the correction of initial state for each member was performed with respect to the observed values, and the corrected members get the training stage to find out an adaptive weight function, which is formulated by Root Mean Square Vector Error (RMSVE). It was found that the optimal training period was 1-day through the experiments of sensitivity to the training interval. We obtained the weighted ensemble average which reveals smaller errors of the spatial and temporal pattern of wind speed than those of the simple ensemble average.

베이지안 칼만 필터 기법의 훈련 기간에 따른 풍력 자원 예측 정확도 향상성 연구 (A Study of Improvement of a Prediction Accuracy about Wind Resources based on Training Period of Bayesian Kalman Filter Technique)

  • 이순환
    • 한국지구과학회지
    • /
    • 제38권1호
    • /
    • pp.11-23
    • /
    • 2017
  • 풍력 자원의 단기 예측 가능성은 풍력 발전 단지의 경제적 타당성을 평가하는 중요한 요소이다. 본 연구에서는 풍력 자원의 단기 예측 가능성을 향상시키는 방법의 하나로 베이지안 칼만 필터를 후처리 과정으로 적용하였다. 이때 추정된 모델과 관측 데이터의 상관관계를 평가하기 위하여 일정 시간 동안 베이지안 칼만 훈련 기간이 요구된다. 본 연구는 여러 훈련 기간에 따라 예측 특성을 정량적으로 분석하였다. 태백 지역에서는 3일 단기 베이지안 칼만 훈련으로 기온과 풍속을 예측하는 것이 다른 훈련 기간을 적용할 때보다 우수한 예측 성능을 보였다. 반면 이어도는 6일 이상의 베이지안 칼만 필터의 훈련 기간을 적용한 경우 가장 좋은 예측 성능을 나타낸다. WRF 예측 성능이 떨어지는 사례에서 베이지안 칼만 필터의 예측 성능향상이 뚜렷하게 나타나며, 반대로 WRF 예측이 정확한 지점에서는 필터적용에 따른 성능향상 정도가 약한 경향을 가진다.

Generating and Validating Synthetic Training Data for Predicting Bankruptcy of Individual Businesses

  • Hong, Dong-Suk;Baik, Cheol
    • Journal of information and communication convergence engineering
    • /
    • 제19권4호
    • /
    • pp.228-233
    • /
    • 2021
  • In this study, we analyze the credit information (loan, delinquency information, etc.) of individual business owners to generate voluminous training data to establish a bankruptcy prediction model through a partial synthetic training technique. Furthermore, we evaluate the prediction performance of the newly generated data compared to the actual data. When using conditional tabular generative adversarial networks (CTGAN)-based training data generated by the experimental results (a logistic regression task), the recall is improved by 1.75 times compared to that obtained using the actual data. The probability that both the actual and generated data are sampled over an identical distribution is verified to be much higher than 80%. Providing artificial intelligence training data through data synthesis in the fields of credit rating and default risk prediction of individual businesses, which have not been relatively active in research, promotes further in-depth research efforts focused on utilizing such methods.

세미감독형 학습 기법을 사용한 소프트웨어 결함 예측 (Software Fault Prediction using Semi-supervised Learning Methods)

  • 홍의석
    • 한국인터넷방송통신학회논문지
    • /
    • 제19권3호
    • /
    • pp.127-133
    • /
    • 2019
  • 소프트웨어 결함 예측 연구들의 대부분은 라벨 데이터를 훈련 데이터로 사용하는 감독형 모델에 관한 연구들이다. 감독형 모델은 높은 예측 성능을 지니지만 대부분 개발 집단들은 충분한 라벨 데이터를 보유하고 있지 않다. 언라벨 데이터만 훈련에 사용하는 비감독형 모델은 모델 구축이 어렵고 성능이 떨어진다. 훈련 데이터로 라벨 데이터와 언라벨 데이터를 모두 사용하는 세미 감독형 모델은 이들의 문제점을 해결한다. Self-training은 세미 감독형 기법들 중 여러 가정과 제약조건들이 가장 적은 기법이다. 본 논문은 Self-training 알고리즘들을 이용해 여러 모델들을 구현하였으며, Accuracy와 AUC를 이용하여 그들을 평가한 결과 YATSI 모델이 가장 좋은 성능을 보였다.

Voting and Ensemble Schemes Based on CNN Models for Photo-Based Gender Prediction

  • Jhang, Kyoungson
    • Journal of Information Processing Systems
    • /
    • 제16권4호
    • /
    • pp.809-819
    • /
    • 2020
  • Gender prediction accuracy increases as convolutional neural network (CNN) architecture evolves. This paper compares voting and ensemble schemes to utilize the already trained five CNN models to further improve gender prediction accuracy. The majority voting usually requires odd-numbered models while the proposed softmax-based voting can utilize any number of models to improve accuracy. The ensemble of CNN models combined with one more fully-connected layer requires further tuning or training of the models combined. With experiments, it is observed that the voting or ensemble of CNN models leads to further improvement of gender prediction accuracy and that especially softmax-based voters always show better gender prediction accuracy than majority voters. Also, compared with softmax-based voters, ensemble models show a slightly better or similar accuracy with added training of the combined CNN models. Softmax-based voting can be a fast and efficient way to get better accuracy without further training since the selection of the top accuracy models among available CNN pre-trained models usually leads to similar accuracy to that of the corresponding ensemble models.

Pipeline wall thinning rate prediction model based on machine learning

  • Moon, Seongin;Kim, Kyungmo;Lee, Gyeong-Geun;Yu, Yongkyun;Kim, Dong-Jin
    • Nuclear Engineering and Technology
    • /
    • 제53권12호
    • /
    • pp.4060-4066
    • /
    • 2021
  • Flow-accelerated corrosion (FAC) of carbon steel piping is a significant problem in nuclear power plants. The basic process of FAC is currently understood relatively well; however, the accuracy of prediction models of the wall-thinning rate under an FAC environment is not reliable. Herein, we propose a methodology to construct pipe wall-thinning rate prediction models using artificial neural networks and a convolutional neural network, which is confined to a straight pipe without geometric changes. Furthermore, a methodology to generate training data is proposed to efficiently train the neural network for the development of a machine learning-based FAC prediction model. Consequently, it is concluded that machine learning can be used to construct pipe wall thinning rate prediction models and optimize the number of training datasets for training the machine learning algorithm. The proposed methodology can be applied to efficiently generate a large dataset from an FAC test to develop a wall thinning rate prediction model for a real situation.

A Survey of Applications of Artificial Intelligence Algorithms in Eco-environmental Modelling

  • Kim, Kang-Suk;Park, Joon-Hong
    • Environmental Engineering Research
    • /
    • 제14권2호
    • /
    • pp.102-110
    • /
    • 2009
  • Application of artificial intelligence (AI) approaches in eco-environmental modeling has gradually increased for the last decade. Comprehensive understanding and evaluation on the applicability of this approach to eco-environmental modeling are needed. In this study, we reviewed the previous studies that used AI-techniques in eco-environmental modeling. Decision Tree (DT) and Artificial Neural Network (ANN) were found to be major AI algorithms preferred by researchers in ecological and environmental modeling areas. When the effect of the size of training data on model prediction accuracy was explored using the data from the previous studies, the prediction accuracy and the size of training data showed nonlinear correlation, which was best-described by hyperbolic saturation function among the tested nonlinear functions including power and logarithmic functions. The hyperbolic saturation equations were proposed to be used as a guideline for optimizing the size of training data set, which is critically important in designing the field experiments required for training AI-based eco-environmental modeling.