• Title/Summary/Keyword: prediction accuracy

Search Result 3,704, Processing Time 0.026 seconds

Accuracy improvement of a collaborative filtering recommender system (협력적 필터링 추천 시스템의 정확도 향상)

  • Lee, Seog-Hwan;Park, Seung-Hun
    • Journal of the Korea Safety Management & Science
    • /
    • v.12 no.1
    • /
    • pp.127-136
    • /
    • 2010
  • In this paper, the author proposed following two methods to improve the accuracy of the recommender system. First, in order to classify the users more accurately, the author used a EMC(Expanded Moving Center) heuristic algorithm which improved clustering accuracy. Second, the author proposed the Neighborhood-oriented preference prediction method that improved the conventional preference prediction methods, so the accuracy of the recommender system is improved. The test result of the recommender system which adapted the above two methods suggested in this paper was improved the accuracy than the conventional recommendation methods.

Balanced Accuracy and Confidence Probability of Interval Estimates

  • Liu, Yi-Hsin;Stan Lipovetsky;Betty L. Hickman
    • International Journal of Reliability and Applications
    • /
    • v.3 no.1
    • /
    • pp.37-50
    • /
    • 2002
  • Simultaneous estimation of accuracy and probability corresponding to a prediction interval is considered in this study. Traditional application of confidence interval forecasting consists in evaluation of interval limits for a given significance level. The wider is this interval, the higher is probability and the lower is the forecast precision. In this paper a measure of stochastic forecast accuracy is introduced, and a procedure for balanced estimation of both the predicting accuracy and confidence probability is elaborated. Solution can be obtained in an optimizing approach. Suggested method is applied to constructing confidence intervals for parameters estimated by normal and t distributions

  • PDF

Evaluation of accuracies of genomic predictions for body conformation traits in Korean Holstein

  • Md Azizul Haque;Mohammad Zahangir Alam;Asif Iqbal;Yun Mi Lee;Chang Gwon Dang;Jong Joo Kim
    • Animal Bioscience
    • /
    • v.37 no.4
    • /
    • pp.555-566
    • /
    • 2024
  • Objective: This study aimed to assess the genetic parameters and accuracy of genomic predictions for twenty-four linear body conformation traits and overall conformation scores in Korean Holstein dairy cows. Methods: A dataset of 2,206 Korean Holsteins was collected, and genotyping was performed using the Illumina Bovine 50K single nucleotide polymorphism (SNP) chip. The traits investigated included body traits (stature, height at front end, chest width, body depth, angularity, body condition score, and locomotion), rump traits (rump angle, rump width, and loin strength), feet and leg traits (rear leg set, rear leg rear view, foot angle, heel depth, and bone quality), udder traits (udder depth, udder texture, udder support, fore udder attachment, front teat placement, front teat length, rear udder height, rear udder width, and rear teat placement), and overall conformation score. Accuracy of genomic predictions was assessed using the single-trait animal model genomic best linear unbiased prediction method implemented in the ASReml-SA v4.2 software. Results: Heritability estimates ranged from 0.10 to 0.50 for body traits, 0.21 to 0.35 for rump traits, 0.13 to 0.29 for feet and leg traits, and 0.05 to 0.46 for udder traits. Rump traits exhibited the highest average heritability (0.29), while feet and leg traits had the lowest estimates (0.21). Accuracy of genomic predictions varied among the twenty-four linear body conformation traits, ranging from 0.26 to 0.49. The heritability and prediction accuracy of genomic estimated breeding value (GEBV) for the overall conformation score were 0.45 and 0.46, respectively. The GEBVs for body conformation traits in Korean Holstein cows had low accuracy, falling below the 50% threshold. Conclusion: The limited response to selection for body conformation traits in Korean Holsteins may be attributed to both the low heritability of these traits and the lower accuracy estimates for GEBVs. Further research is needed to enhance the accuracy of GEBVs and improve the selection response for these traits.

Application of Asymmetric Support Vector Regression Considering Predictive Propensity (예측성향을 고려한 비대칭 서포트벡터 회귀의 적용)

  • Lee, Dongju
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.45 no.1
    • /
    • pp.71-82
    • /
    • 2022
  • Most of the predictions using machine learning are neutral predictions considering the symmetrical situation where the predicted value is not smaller or larger than the actual value. However, in some situations, asymmetric prediction such as over-prediction or under-prediction may be better than neutral prediction, and it can induce better judgment by providing various predictions to decision makers. A method called Asymmetric Twin Support Vector Regression (ATSVR) using TSVR(Twin Support Vector Regression), which has a fast calculation time, was proposed by controlling the asymmetry of the upper and lower widths of the ε-tube and the asymmetry of the penalty with two parameters. In addition, by applying the existing GSVQR and the proposed ATSVR, prediction using the prediction propensities of over-prediction, under-prediction, and neutral prediction was performed. When two parameters were used for both GSVQR and ATSVR, it was possible to predict according to the prediction propensity, and ATSVR was found to be more than twice as fast in terms of calculation time. On the other hand, in terms of accuracy, there was no significant difference between ATSVR and GSVQR, but it was found that GSVQR reflected the prediction propensity better than ATSVR when checking the figures. The accuracy of under-prediction or over-prediction was lower than that of neutral prediction. It seems that using both parameters rather than using one of the two parameters (p_1,p_2) increases the change in the prediction tendency. However, depending on the situation, it may be better to use only one of the two parameters.

The Prediction of Export Credit Guarantee Accident using Machine Learning (기계학습을 이용한 수출신용보증 사고예측)

  • Cho, Jaeyoung;Joo, Jihwan;Han, Ingoo
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.83-102
    • /
    • 2021
  • The government recently announced various policies for developing big-data and artificial intelligence fields to provide a great opportunity to the public with respect to disclosure of high-quality data within public institutions. KSURE(Korea Trade Insurance Corporation) is a major public institution for financial policy in Korea, and thus the company is strongly committed to backing export companies with various systems. Nevertheless, there are still fewer cases of realized business model based on big-data analyses. In this situation, this paper aims to develop a new business model which can be applied to an ex-ante prediction for the likelihood of the insurance accident of credit guarantee. We utilize internal data from KSURE which supports export companies in Korea and apply machine learning models. Then, we conduct performance comparison among the predictive models including Logistic Regression, Random Forest, XGBoost, LightGBM, and DNN(Deep Neural Network). For decades, many researchers have tried to find better models which can help to predict bankruptcy since the ex-ante prediction is crucial for corporate managers, investors, creditors, and other stakeholders. The development of the prediction for financial distress or bankruptcy was originated from Smith(1930), Fitzpatrick(1932), or Merwin(1942). One of the most famous models is the Altman's Z-score model(Altman, 1968) which was based on the multiple discriminant analysis. This model is widely used in both research and practice by this time. The author suggests the score model that utilizes five key financial ratios to predict the probability of bankruptcy in the next two years. Ohlson(1980) introduces logit model to complement some limitations of previous models. Furthermore, Elmer and Borowski(1988) develop and examine a rule-based, automated system which conducts the financial analysis of savings and loans. Since the 1980s, researchers in Korea have started to examine analyses on the prediction of financial distress or bankruptcy. Kim(1987) analyzes financial ratios and develops the prediction model. Also, Han et al.(1995, 1996, 1997, 2003, 2005, 2006) construct the prediction model using various techniques including artificial neural network. Yang(1996) introduces multiple discriminant analysis and logit model. Besides, Kim and Kim(2001) utilize artificial neural network techniques for ex-ante prediction of insolvent enterprises. After that, many scholars have been trying to predict financial distress or bankruptcy more precisely based on diverse models such as Random Forest or SVM. One major distinction of our research from the previous research is that we focus on examining the predicted probability of default for each sample case, not only on investigating the classification accuracy of each model for the entire sample. Most predictive models in this paper show that the level of the accuracy of classification is about 70% based on the entire sample. To be specific, LightGBM model shows the highest accuracy of 71.1% and Logit model indicates the lowest accuracy of 69%. However, we confirm that there are open to multiple interpretations. In the context of the business, we have to put more emphasis on efforts to minimize type 2 error which causes more harmful operating losses for the guaranty company. Thus, we also compare the classification accuracy by splitting predicted probability of the default into ten equal intervals. When we examine the classification accuracy for each interval, Logit model has the highest accuracy of 100% for 0~10% of the predicted probability of the default, however, Logit model has a relatively lower accuracy of 61.5% for 90~100% of the predicted probability of the default. On the other hand, Random Forest, XGBoost, LightGBM, and DNN indicate more desirable results since they indicate a higher level of accuracy for both 0~10% and 90~100% of the predicted probability of the default but have a lower level of accuracy around 50% of the predicted probability of the default. When it comes to the distribution of samples for each predicted probability of the default, both LightGBM and XGBoost models have a relatively large number of samples for both 0~10% and 90~100% of the predicted probability of the default. Although Random Forest model has an advantage with regard to the perspective of classification accuracy with small number of cases, LightGBM or XGBoost could become a more desirable model since they classify large number of cases into the two extreme intervals of the predicted probability of the default, even allowing for their relatively low classification accuracy. Considering the importance of type 2 error and total prediction accuracy, XGBoost and DNN show superior performance. Next, Random Forest and LightGBM show good results, but logistic regression shows the worst performance. However, each predictive model has a comparative advantage in terms of various evaluation standards. For instance, Random Forest model shows almost 100% accuracy for samples which are expected to have a high level of the probability of default. Collectively, we can construct more comprehensive ensemble models which contain multiple classification machine learning models and conduct majority voting for maximizing its overall performance.

Mobility Prediction Algorithms Using User Traces in Wireless Networks

  • Luong, Chuyen;Do, Son;Park, Hyukro;Choi, Deokjai
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.8
    • /
    • pp.946-952
    • /
    • 2014
  • Mobility prediction is one of hot topics using location history information. It is useful for not only user-level applications such as people finder and recommendation sharing service but also for system-level applications such as hand-off management, resource allocation, and quality of service of wireless services. Most of current prediction techniques often use a set of significant locations without taking into account possible location information changes for prediction. Markov-based, LZ-based and Prediction by Pattern Matching techniques consider interesting locations to enhance the prediction accuracy, but they do not consider interesting location changes. In our paper, we propose an algorithm which integrates the changing or emerging new location information. This approach is based on Active LeZi algorithm, but both of new location and all possible location contexts will be updated in the tree with the fixed depth. Furthermore, the tree will also be updated even when there is no new location detected but the expected route is changed. We find that our algorithm is adaptive to predict next location. We evaluate our proposed system on a part of Dartmouth dataset consisting of 1026 users. An accuracy rate of more than 84% is achieved.

TIME SERIES PREDICTION USING INCREMENTAL REGRESSION

  • Kim, Sung-Hyun;Lee, Yong-Mi;Jin, Long;Chai, Duck-Jin;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • v.2
    • /
    • pp.635-638
    • /
    • 2006
  • Regression of conventional prediction techniques in data mining uses the model which is generated from the training step. This model is applied to new input data without any change. If this model is applied directly to time series, the rate of prediction accuracy will be decreased. This paper proposes an incremental regression for time series prediction like typhoon track prediction. This technique considers the characteristic of time series which may be changed over time. It is composed of two steps. The first step executes a fractional process for applying input data to the regression model. The second step updates the model by using its information as new data. Additionally, the model is maintained by only recent data in a queue. This approach has the following two advantages. It maintains the minimum information of the model by using a matrix, so space complexity is reduced. Moreover, it prevents the increment of error rate by updating the model over time. Accuracy rate of the proposed method is measured by RME(Relative Mean Error) and RMSE(Root Mean Square Error). The results of typhoon track prediction experiment are performed by the proposed technique IMLR(Incremental Multiple Linear Regression) is more efficient than those of MLR(Multiple Linear Regression) and SVR(Support Vector Regression).

  • PDF

A Prediction of Chip Quality using OPTICS (Ordering Points to Identify the Clustering Structure)-based Feature Extraction at the Cell Level (셀 레벨에서의 OPTICS 기반 특질 추출을 이용한 칩 품질 예측)

  • Kim, Ki Hyun;Baek, Jun Geol
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.40 no.3
    • /
    • pp.257-266
    • /
    • 2014
  • The semiconductor manufacturing industry is managed by a number of parameters from the FAB which is the initial step of production to package test which is the final step of production. Various methods for prediction for the quality and yield are required to reduce the production costs caused by a complicated manufacturing process. In order to increase the accuracy of quality prediction, we have to extract the significant features from the large amount of data. In this study, we propose the method for extracting feature from the cell level data of probe test process using OPTICS which is one of the density-based clustering to improve the prediction accuracy of the quality of the assembled chips that will be placed in a package test. Two features extracted by using OPTICS are used as input variables of quality prediction model because of having position information of the cell defect. The package test progress for chips classified to the correct quality grade by performing the improved prediction method is expected to bring the effect of reducing production costs.

Prediction of 305 Days Milk Production from Early Records in Dairy Cattle Using an Empirical Bayes Method

  • Pereira, J.A.C.;Suzuki, M.;Hagiya, K.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.14 no.11
    • /
    • pp.1511-1515
    • /
    • 2001
  • A prediction of 305 d milk production from early records using an empirical Bayes method (EBM) was performed. The EBM was compared with the best predicted estimation (BPE), test interval method (TIM), and the linearized Wood's model (LWM). Daily milk yields were obtained from 606 first lactation Japanese Holstein cows in three herds. From each file of 305 daily records, 10 random test day records with an interval of approximately one month were taken. The accuracies of these methods were compared using the absolute difference (AD) and the standard deviation (SD) of the differences between the actual and the estimated 305 d milk production. The results showed that in the early stage of the lactation, EBM was superior in obtaining the prediction with high accuracy. When all the herds were analyzed jointly, the AD during the first 5 test day records were on average 373, 590, 917 and 1,042 kg for EBM, BPE, TIM, and LWM, respectively. Corresponding SD for EBM, BPE, TIM, and LWM were on average 488, 733, 747 and 1,605 kg. When the herds were analyzed separately, the EBM predictions retained high accuracy. When more information on the actual lactation was added to the prediction, TIM and LWM gradually achieved better accuracies. Finally, in the last period of the lactation, the accuracy of both of the methods exceeded EBM and BPM. The AD for the last 2 samples analyzing all the herds jointly were on average 141, 142, 164, and 214 kg for LWM, TIM, EBM, and BPE, respectively. In the current practices of collecting monthly records, early prediction of future milk production may be more accurate using EBM. Alternatively, if enough information of the actual lactation is accumulated, TIM may obtain better accuracy in the latter stage of lactation.

Relationships Between the Characteristics of the Business Data Set and Forecasting Accuracy of Prediction models (시계열 데이터의 성격과 예측 모델의 예측력에 관한 연구)

  • 이원하;최종욱
    • Journal of Intelligence and Information Systems
    • /
    • v.4 no.1
    • /
    • pp.133-147
    • /
    • 1998
  • Recently, many researchers have been involved in finding deterministic equations which can accurately predict future event, based on chaotic theory, or fractal theory. The theory says that some events which seem very random but internally deterministic can be accurately predicted by fractal equations. In contrast to the conventional methods, such as AR model, MA, model, or ARIMA model, the fractal equation attempts to discover a deterministic order inherent in time series data set. In discovering deterministic order, researchers have found that neural networks are much more effective than the conventional statistical models. Even though prediction accuracy of the network can be different depending on the topological structure and modification of the algorithms, many researchers asserted that the neural network systems outperforms other systems, because of non-linear behaviour of the network models, mechanisms of massive parallel processing, generalization capability based on adaptive learning. However, recent survey shows that prediction accuracy of the forecasting models can be determined by the model structure and data structures. In the experiments based on actual economic data sets, it was found that the prediction accuracy of the neural network model is similar to the performance level of the conventional forecasting model. Especially, for the data set which is deterministically chaotic, the AR model, a conventional statistical model, was not significantly different from the MLP model, a neural network model. This result shows that the forecasting model. This result shows that the forecasting model a, pp.opriate to a prediction task should be selected based on characteristics of the time series data set. Analysis of the characteristics of the data set was performed by fractal analysis, measurement of Hurst index, and measurement of Lyapunov exponents. As a conclusion, a significant difference was not found in forecasting future events for the time series data which is deterministically chaotic, between a conventional forecasting model and a typical neural network model.

  • PDF