• Title/Summary/Keyword: Prediction models

Search Result 4,427, Processing Time 0.031 seconds

Severity-based Software Quality Prediction using Class Imbalanced Data

  • Hong, Euy-Seok;Park, Mi-Kyeong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.4
    • /
    • pp.73-80
    • /
    • 2016
  • Most fault prediction models have class imbalance problems because training data usually contains much more non-fault class modules than fault class ones. This imbalanced distribution makes it difficult for the models to learn the minor class module data. Data imbalance is much higher when severity-based fault prediction is used. This is because high severity fault modules is a smaller subset of the fault modules. In this paper, we propose severity-based models to solve these problems using the three sampling methods, Resample, SpreadSubSample and SMOTE. Empirical results show that Resample method has typical over-fit problems, and SpreadSubSample method cannot enhance the prediction performance of the models. Unlike two methods, SMOTE method shows good performance in terms of AUC and FNR values. Especially J48 decision tree model using SMOTE outperforms other prediction models.

TRAFFIC-FLOW-PREDICTION SYSTEMS BASED ON UPSTREAM TRAFFIC (교통량예측모형의 개발과 평가)

  • 김창균
    • Proceedings of the KOR-KST Conference
    • /
    • 1995.02a
    • /
    • pp.84-98
    • /
    • 1995
  • Network-based model were developed to predict short term future traffic volume based on current traffic, historical average, and upstream traffic. It is presumed that upstream traffic volume can be used to predict the downstream traffic in a specific time period. Three models were developed for traffic flow prediction; a combination of historical average and upstream traffic, a combination of current traffic and upstream traffic, and a combination of all three variables. The three models were evaluated using regression analysis. The third model is found to provide the best prediction for the analyzed data. In order to balance the variables appropriately according to the present traffic condition, a heuristic adaptive weighting system is devised based on the relationships between the beginning period of prediction and the previous periods. The developed models were applied to 15-minute freeway data obtained by regular induction loop detectors. The prediction models were shown to be capable of producing reliable and accurate forecasts under congested traffic condition. The prediction systems perform better in the 15-minute range than in the ranges of 30-to 45-minute. It is also found that the combined models usually produce more consistent forecasts than the historical average.

  • PDF

Purchase Prediction by Analyzing Users' Online Behaviors Using Machine Learning and Information Theory Approaches

  • Kim, Minsung;Im, Il;Han, Sangman
    • Asia pacific journal of information systems
    • /
    • v.26 no.1
    • /
    • pp.66-79
    • /
    • 2016
  • The availability of detailed data on customers' online behaviors and advances in big data analysis techniques enable us to predict consumer behaviors. In the past, researchers have built purchase prediction models by analyzing clickstream data; however, these clickstream-based prediction models have had several limitations. In this study, we propose a new method for purchase prediction that combines information theory with machine learning techniques. Clickstreams from 5,000 panel members and data on their purchases of electronics, fashion, and cosmetics products were analyzed. Clickstreams were summarized using the 'entropy' concept from information theory, while 'random forests' method was applied to build prediction models. The results show that prediction accuracy of this new method ranges from 0.56 to 0.83, which is a significant improvement over values for clickstream-based prediction models presented in the past. The results indicate further that consumers' information search behaviors differ significantly across product categories.

Application of deep learning with bivariate models for genomic prediction of sow lifetime productivity-related traits

  • Joon-Ki Hong;Yong-Min Kim;Eun-Seok Cho;Jae-Bong Lee;Young-Sin Kim;Hee-Bok Park
    • Animal Bioscience
    • /
    • v.37 no.4
    • /
    • pp.622-630
    • /
    • 2024
  • Objective: Pig breeders cannot obtain phenotypic information at the time of selection for sow lifetime productivity (SLP). They would benefit from obtaining genetic information of candidate sows. Genomic data interpreted using deep learning (DL) techniques could contribute to the genetic improvement of SLP to maximize farm profitability because DL models capture nonlinear genetic effects such as dominance and epistasis more efficiently than conventional genomic prediction methods based on linear models. This study aimed to investigate the usefulness of DL for the genomic prediction of two SLP-related traits; lifetime number of litters (LNL) and lifetime pig production (LPP). Methods: Two bivariate DL models, convolutional neural network (CNN) and local convolutional neural network (LCNN), were compared with conventional bivariate linear models (i.e., genomic best linear unbiased prediction, Bayesian ridge regression, Bayes A, and Bayes B). Phenotype and pedigree data were collected from 40,011 sows that had husbandry records. Among these, 3,652 pigs were genotyped using the PorcineSNP60K BeadChip. Results: The best predictive correlation for LNL was obtained with CNN (0.28), followed by LCNN (0.26) and conventional linear models (approximately 0.21). For LPP, the best predictive correlation was also obtained with CNN (0.29), followed by LCNN (0.27) and conventional linear models (approximately 0.25). A similar trend was observed with the mean squared error of prediction for the SLP traits. Conclusion: This study provides an example of a CNN that can outperform against the linear model-based genomic prediction approaches when the nonlinear interaction components are important because LNL and LPP exhibited strong epistatic interaction components. Additionally, our results suggest that applying bivariate DL models could also contribute to the prediction accuracy by utilizing the genetic correlation between LNL and LPP.

Assessment of Prediction Ability of Atomization and Droplet Breakup Models on Diesel Spray Dynamic (디젤분무에서 미립화 및 액적분열모델의 예측능력평가)

  • Kim, J.I.;No, S.Y.
    • Journal of ILASS-Korea
    • /
    • v.5 no.2
    • /
    • pp.35-42
    • /
    • 2000
  • A number of atomization and droplet breakup models have been developed and used to predict the diesel spray characteristics. Of the many atomization and droplet breakup models based on the breakup mechanism due to aerodynamic liquid and gas interaction, four models classified as mathematical models, such as TAB, modified TAB, DDB, WB and one of the hybrid model based on WB and TAB models were selected for the assessment of prediction ability of diesel spray dynamics. The assessment of these models by using KIVA-II code was performed by comparing with the experimental data of spray tip penetration and sauter mean diameter(SMD) from the literature. It is found that the prediction of spray tip penetration and SMD by the hybrid model was only influenced by the initial parcel number. All the atomization and droplet breakup models considered here was strongly dependent on the grid resolution. Therefore it is important to check the grid resolution to get an acceptable results in selecting the models. At low injection pressure, modified TAB model could only give the good agreement with experimental data of spray tip penetration and both of modified TAB and DDB models were recommendable for the prediction of SMD. At high injection pressure, hybrid model could only give the good agreement with the experimental data of spray tip penetration and the prediction of all of the selected models did not match the experimental data. Spray tip penetration was increased with the increase the $B_1$ and the increase of $B_1$ did not affected the prediction of SMD.

  • PDF

Comparison of long-term forecasting performance of export growth rate using time series analysis models and machine learning analysis (시계열 분석 모형 및 머신 러닝 분석을 이용한 수출 증가율 장기예측 성능 비교)

  • Seong-Hwi Nam
    • Korea Trade Review
    • /
    • v.46 no.6
    • /
    • pp.191-209
    • /
    • 2021
  • In this paper, various time series analysis models and machine learning models are presented for long-term prediction of export growth rate, and the prediction performance is compared and reviewed by RMSE and MAE. Export growth rate is one of the major economic indicators to evaluate the economic status. And It is also used to predict economic forecast. The export growth rate may have a negative (-) value as well as a positive (+) value. Therefore, Instead of using the ReLU function, which is often used for time series prediction of deep learning models, the PReLU function, which can have a negative (-) value as an output value, was used as the activation function of deep learning models. The time series prediction performance of each model for three types of data was compared and reviewed. The forecast data of long-term prediction of export growth rate was deduced by three forecast methods such as a fixed forecast method, a recursive forecast method and a rolling forecast method. As a result of the forecast, the traditional time series analysis model, ARDL, showed excellent performance, but as the time period of learning data increases, the performance of machine learning models including LSTM was relatively improved.

The Accuracy of Prediction Models in Burn Patients (화상환자에서 사망예측모델의 성능 평가에 관한 연구)

  • Woo, Jaeyeon;Kym, Dohern
    • Journal of the Korean Burn Society
    • /
    • v.24 no.1
    • /
    • pp.1-6
    • /
    • 2021
  • Purpose: The purpose of this study was to evaluate the accuracy of four prediction models in adult burn patients. Methods: This retrospective study was conducted on 696 adult burn patients who were treated at burn intensive care unit (BICU) of Hallym University Hangang Sacred Heart Hospital from January 2017 to December 2019. The models are ABSI, APACHE IV, rBaux and Hangang score. Results: The discrimination of each prediction model was analyzed as AUC of ROC curve. AUC value was the highest with Hangang score of 0.931 (0.908~0.954), followed by rBaux 0.896 (0.867~0.924), ABSI 0.883 (0.853~0.913) and APACHE IV 0.851 (0.818~0.884). Conclusion: The results of evaluating the accuracy of the four models, Hangang score showed the highest prediction. But it is necessary to apply the appropriate prediction model according to characteristics of the burn center.

Performance Evaluation of a Feature-Importance-based Feature Selection Method for Time Series Prediction

  • Hyun, Ahn
    • Journal of information and communication convergence engineering
    • /
    • v.21 no.1
    • /
    • pp.82-89
    • /
    • 2023
  • Various machine-learning models may yield high predictive power for massive time series for time series prediction. However, these models are prone to instability in terms of computational cost because of the high dimensionality of the feature space and nonoptimized hyperparameter settings. Considering the potential risk that model training with a high-dimensional feature set can be time-consuming, we evaluate a feature-importance-based feature selection method to derive a tradeoff between predictive power and computational cost for time series prediction. We used two machine learning techniques for performance evaluation to generate prediction models from a retail sales dataset. First, we ranked the features using impurity- and Local Interpretable Model-agnostic Explanations (LIME) -based feature importance measures in the prediction models. Then, the recursive feature elimination method was applied to eliminate unimportant features sequentially. Consequently, we obtained a subset of features that could lead to reduced model training time while preserving acceptable model performance.

Structural monitoring and maintenance by quantitative forecast model via gray models

  • C.C. Hung;T. Nguyen
    • Structural Monitoring and Maintenance
    • /
    • v.10 no.2
    • /
    • pp.175-190
    • /
    • 2023
  • This article aims to quantitatively predict the snowmelt in extreme cold regions, considering a combination of grayscale and neural models. The traditional non-equidistant GM(1,1) prediction model is optimized by adjusting the time-distance weight matrix, optimizing the background value of the differential equation and optimizing the initial value of the model, and using the BP neural network for the first. The adjusted ice forecast model has an accuracy of 0.984 and posterior variance and the average forecast error value is 1.46%. Compared with the GM(1,1) and BP network models, the accuracy of the prediction results has been significantly improved, and the quantitative prediction of the ice sheet is more accurate. The monitoring and maintenance of the structure by quantitative prediction model by gray models was clearly demonstrated in the model.

A Study on the Improvement of the Road Traffic Noise Prediction for Environmental Impact Assessment (환경영향평가시 도로교통소음예측에 관한 개선방안 연구)

  • Lee, Nae-Hyun;Park, Young-Min;Sunwoo, Young
    • Journal of Environmental Impact Assessment
    • /
    • v.10 no.4
    • /
    • pp.297-304
    • /
    • 2001
  • Recently the road traffic noise has appeared as a significant environmental issue because of dramatic increase of vehicles and expansion of newly constructed road. Therefore, this study proposes the method that improves prediction factors and models through analysis of the existing road traffic noise prediction model. Prediction factors can be improved by establishing guideline for diffraction attenuation and applying daily traffic discharge, peak traffic discharge, and average traveling speed through an analysis of level service. Prediction must be made by periods of one or five years during 20 years. Prediction models also can be improved to include better prediction model through setting the database, establishing functional relation between physical properties and noise levels by acoustic analysis, and developing models for road traffic noise prediction in residential areas.

  • PDF