• Title/Summary/Keyword: prediction error methods

Search Result 525, Processing Time 0.022 seconds

A Prediction Method of Learning Outcomes based on Regression Model for Effective Peer Review Learning (효율적인 피어리뷰 학습을 위한 회귀 모델 기반 학습성과 예측 방법)

  • Shin, Hyo-Joung;Jung, Hye-Wuk;Cho, Kwang-Su;Lee, Jee-Hyoung
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.22 no.5
    • /
    • pp.624-630
    • /
    • 2012
  • The peer review learning is a method which improves learning outcome of students through feedback between students and the observation and analysis of other students. One of the important problems in a peer review system is to find proper evaluators to each learner considering characteristics of students for improving learning outcomes. Some of peer review systems randomly assign peer review evaluators to learners, or chose evaluators based on limited strategies. However, these systems have a problem that they do not consider various characteristics of learners and evaluators who participate in peer reviews. In this paper, we propose a novel prediction approach of learning outcomes to apply peer review systems considering various characteristics of learners and evaluators. The proposed approach extracts representative attributes from the profiles of students and predicts learning outcomes using various regression models. In order to verify how much outliers affect on the prediction of learning outcomes, we also apply several outlier removal methods to the regression models and compare the predictive performance of learning outcomes. The experiment result says that the SVR model which does not removes outliers shows an error rate of 0.47% on average and has the best predictive performance.

Prediction of golf scores on the PGA tour using statistical models (PGA 투어의 골프 스코어 예측 및 분석)

  • Lim, Jungeun;Lim, Youngin;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.1
    • /
    • pp.41-55
    • /
    • 2017
  • This study predicts the average scores of top 150 PGA golf players on 132 PGA Tour tournaments (2013-2015) using data mining techniques and statistical analysis. This study also aims to predict the Top 10 and Top 25 best players in 4 different playoffs. Linear and nonlinear regression methods were used to predict average scores. Stepwise regression, all best subset, LASSO, ridge regression and principal component regression were used for the linear regression method. Tree, bagging, gradient boosting, neural network, random forests and KNN were used for nonlinear regression method. We found that the average score increases as fairway firmness or green height or average maximum wind speed increases. We also found that the average score decreases as the number of one-putts or scrambling variable or longest driving distance increases. All 11 different models have low prediction error when predicting the average scores of PGA Tournaments in 2015 which is not included in the training set. However, the performances of Bagging and Random Forest models are the best among all models and these two models have the highest prediction accuracy when predicting the Top 10 and Top 25 best players in 4 different playoffs.

Theoretical and experimental studies on influence of electrode variations in electrical resistivity survey for tunnel ahead prediction (터널 굴착면 전방조사를 위한 전기비저항 탐사에서 전극의 변화가 미치는 영향에 대한 이론 및 실험연구)

  • Hong, Chang-Ho;Chong, Song-Hun;Hong, Eun-Soo;Cho, Gye-Chun;Kwon, Tae-Hyuk
    • Journal of Korean Tunnelling and Underground Space Association
    • /
    • v.21 no.2
    • /
    • pp.267-278
    • /
    • 2019
  • Variety of tunnel ahead prediction methods have been performed for safe tunnel construction during tunnel excavation. Pole-pole array among the electrical resistivity survey, which is one of the tunnel ahead prediction method, has been utilized to predict water-bearing sediments or weak zone located within 5 times of tunnel diameter. One of the most important processes is the estimation of virgin ground resistivity and it can be obtained from the following process: 1) calculation of contact area between the electrodes and the medium, and 2) assumption of the electrodes as equivalent spherical electrodes which have a same surface area with the electrodes. This assumption is valid in a small contact area and sufficient distance between the electrodes. Since the measured resistance, in general, varies with the electrode size, shape, and distance between the electrodes, it is necessary to evaluate the influence of these factors. In this study, theoretical equations were derived and experimental tests were conducted considering the electrode size, shape, and distance of cylindrical electrodes which is the most commonly utilized electrode shape. Through this theoretical and experimental study, it is known that one should be careful to use the assumption of the equivalent half-spherical electrode with large ratio between the penetrated depth and radius of the cylindrical electrode, as the error may get larger.

Determination of Survival of Gastric Cancer Patients With Distant Lymph Node Metastasis Using Prealbumin Level and Prothrombin Time: Contour Plots Based on Random Survival Forest Algorithm on High-Dimensionality Clinical and Laboratory Datasets

  • Zhang, Cheng;Xie, Minmin;Zhang, Yi;Zhang, Xiaopeng;Feng, Chong;Wu, Zhijun;Feng, Ying;Yang, Yahui;Xu, Hui;Ma, Tai
    • Journal of Gastric Cancer
    • /
    • v.22 no.2
    • /
    • pp.120-134
    • /
    • 2022
  • Purpose: This study aimed to identify prognostic factors for patients with distant lymph node-involved gastric cancer (GC) using a machine learning algorithm, a method that offers considerable advantages and new prospects for high-dimensional biomedical data exploration. Materials and Methods: This study employed 79 features of clinical pathology, laboratory tests, and therapeutic details from 289 GC patients whose distant lymphadenopathy was presented as the first episode of recurrence or metastasis. Outcomes were measured as any-cause death events and survival months after distant lymph node metastasis. A prediction model was built based on possible outcome predictors using a random survival forest algorithm and confirmed by 5×5 nested cross-validation. The effects of single variables were interpreted using partial dependence plots. A contour plot was used to visually represent survival prediction based on 2 predictive features. Results: The median survival time of patients with GC with distant nodal metastasis was 9.2 months. The optimal model incorporated the prealbumin level and the prothrombin time (PT), and yielded a prediction error of 0.353. The inclusion of other variables resulted in poorer model performance. Patients with higher serum prealbumin levels or shorter PTs had a significantly better prognosis. The predicted one-year survival rate was stratified and illustrated as a contour plot based on the combined effect the prealbumin level and the PT. Conclusions: Machine learning is useful for identifying the important determinants of cancer survival using high-dimensional datasets. The prealbumin level and the PT on distant lymph node metastasis are the 2 most crucial factors in predicting the subsequent survival time of advanced GC.

A Study on the Development of Construction Budget Estimating Model for Public Office Buildings based on Artificial Neural Network (인공신경망 기반의 공공청사 공사비 예산 예측모델 개발 연구)

  • Kim, Hyeon Jin;Kim, Han Soo
    • Korean Journal of Construction Engineering and Management
    • /
    • v.24 no.5
    • /
    • pp.22-34
    • /
    • 2023
  • Predicting accurately the construction cost budget in the early stages of construction projects is crucial to support the client's decision-making and achieve the objectives of the construction project. This holds true for public construction projects as well. However, the current methods for predicting construction cost budgets in the early stages of public construction projects are not sophisticated enough in terms of accuracy and reliability, indicating a need for improvement. The objective of this study is to develop a construction cost budget prediction model that can be utilized in the early stages of public building projects using an artificial neural network (ANN). In this study, an artificial neural network model was developed using the SPSS Statistics program and the data provided by the Public Procurement Service. The level of construction cost budget prediction was analyzed, and the accuracy of the model was validated through additional testing. The validation results demonstrated that the developed artificial neural network model exhibited an error range for estimates that can be utilized in the early stages of projects, indicating the potential to predict construction cost budgets more accurately by incorporating various project conditions.

An Intelligent Decision Support System for Selecting Promising Technologies for R&D based on Time-series Patent Analysis (R&D 기술 선정을 위한 시계열 특허 분석 기반 지능형 의사결정지원시스템)

  • Lee, Choongseok;Lee, Suk Joo;Choi, Byounggu
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.79-96
    • /
    • 2012
  • As the pace of competition dramatically accelerates and the complexity of change grows, a variety of research have been conducted to improve firms' short-term performance and to enhance firms' long-term survival. In particular, researchers and practitioners have paid their attention to identify promising technologies that lead competitive advantage to a firm. Discovery of promising technology depends on how a firm evaluates the value of technologies, thus many evaluating methods have been proposed. Experts' opinion based approaches have been widely accepted to predict the value of technologies. Whereas this approach provides in-depth analysis and ensures validity of analysis results, it is usually cost-and time-ineffective and is limited to qualitative evaluation. Considerable studies attempt to forecast the value of technology by using patent information to overcome the limitation of experts' opinion based approach. Patent based technology evaluation has served as a valuable assessment approach of the technological forecasting because it contains a full and practical description of technology with uniform structure. Furthermore, it provides information that is not divulged in any other sources. Although patent information based approach has contributed to our understanding of prediction of promising technologies, it has some limitations because prediction has been made based on the past patent information, and the interpretations of patent analyses are not consistent. In order to fill this gap, this study proposes a technology forecasting methodology by integrating patent information approach and artificial intelligence method. The methodology consists of three modules : evaluation of technologies promising, implementation of technologies value prediction model, and recommendation of promising technologies. In the first module, technologies promising is evaluated from three different and complementary dimensions; impact, fusion, and diffusion perspectives. The impact of technologies refers to their influence on future technologies development and improvement, and is also clearly associated with their monetary value. The fusion of technologies denotes the extent to which a technology fuses different technologies, and represents the breadth of search underlying the technology. The fusion of technologies can be calculated based on technology or patent, thus this study measures two types of fusion index; fusion index per technology and fusion index per patent. Finally, the diffusion of technologies denotes their degree of applicability across scientific and technological fields. In the same vein, diffusion index per technology and diffusion index per patent are considered respectively. In the second module, technologies value prediction model is implemented using artificial intelligence method. This studies use the values of five indexes (i.e., impact index, fusion index per technology, fusion index per patent, diffusion index per technology and diffusion index per patent) at different time (e.g., t-n, t-n-1, t-n-2, ${\cdots}$) as input variables. The out variables are values of five indexes at time t, which is used for learning. The learning method adopted in this study is backpropagation algorithm. In the third module, this study recommends final promising technologies based on analytic hierarchy process. AHP provides relative importance of each index, leading to final promising index for technology. Applicability of the proposed methodology is tested by using U.S. patents in international patent class G06F (i.e., electronic digital data processing) from 2000 to 2008. The results show that mean absolute error value for prediction produced by the proposed methodology is lower than the value produced by multiple regression analysis in cases of fusion indexes. However, mean absolute error value of the proposed methodology is slightly higher than the value of multiple regression analysis. These unexpected results may be explained, in part, by small number of patents. Since this study only uses patent data in class G06F, number of sample patent data is relatively small, leading to incomplete learning to satisfy complex artificial intelligence structure. In addition, fusion index per technology and impact index are found to be important criteria to predict promising technology. This study attempts to extend the existing knowledge by proposing a new methodology for prediction technology value by integrating patent information analysis and artificial intelligence network. It helps managers who want to technology develop planning and policy maker who want to implement technology policy by providing quantitative prediction methodology. In addition, this study could help other researchers by proving a deeper understanding of the complex technological forecasting field.

A Comparative Study On Accident Prediction Model Using Nonlinear Regression And Artificial Neural Network, Structural Equation for Rural 4-Legged Intersection (비선형 회귀분석, 인공신경망, 구조방정식을 이용한 지방부 4지 신호교차로 교통사고 예측모형 성능 비교 연구)

  • Oh, Ju Taek;Yun, Ilsoo;Hwang, Jeong Won;Han, Eum
    • Journal of Korean Society of Transportation
    • /
    • v.32 no.3
    • /
    • pp.266-279
    • /
    • 2014
  • For the evaluation of roadway safety, diverse methods, including before-after studies, simple comparison using historic traffic accident data, methods based on experts' opinion or literature, have been applied. Especially, many research efforts have developed traffic accident prediction models in order to identify critical elements causing accidents and evaluate the level of safety. A traffic accident prediction model must secure predictability and transferability. By acquiring the predictability, the model can increase the accuracy in predicting the frequency of accidents qualitatively and quantitatively. By guaranteeing the transferability, the model can be used for other locations with acceptable accuracy. To this end, traffic accident prediction models using non-linear regression, artificial neural network, and structural equation were developed in this study. The predictability and transferability of three models were compared using a model development data set collected from 90 signalized intersections and a model validation data set from other 33 signalized intersections based on mean absolute deviation and mean squared prediction error. As a result of the comparison using the model development data set, the artificial neural network showed the highest predictability. However, the non-linear regression model was found out to be most appropriate in the comparison using the model validation data set. Conclusively, the artificial neural network has a strong ability in representing the relationship between the frequency of traffic accidents and traffic and road design elements. However, the predictability of the artificial neural network significantly decreased when the artificial neural network was applied to a new data which was not used in the model developing.

Removal of Seabed Multiples in Seismic Reflection Data using Machine Learning (머신러닝을 이용한 탄성파 반사법 자료의 해저면 겹반사 제거)

  • Nam, Ho-Soo;Lim, Bo-Sung;Kweon, Il-Ryong;Kim, Ji-Soo
    • Geophysics and Geophysical Exploration
    • /
    • v.23 no.3
    • /
    • pp.168-177
    • /
    • 2020
  • Seabed multiple reflections (seabed multiples) are the main cause of misinterpretations of primary reflections in both shot gathers and stack sections. Accordingly, seabed multiples need to be suppressed throughout data processing. Conventional model-driven methods, such as prediction-error deconvolution, Radon filtering, and data-driven methods, such as the surface-related multiple elimination technique, have been used to attenuate multiple reflections. However, the vast majority of processing workflows require time-consuming steps when testing and selecting the processing parameters in addition to computational power and skilled data-processing techniques. To attenuate seabed multiples in seismic reflection data, input gathers with seabed multiples and label gathers without seabed multiples were generated via numerical modeling using the Marmousi2 velocity structure. The training data consisted of normal-moveout-corrected common midpoint gathers fed into a U-Net neural network. The well-trained model was found to effectively attenuate the seabed multiples according to the image similarity between the prediction result and the target data, and demonstrated good applicability to field data.

Integrating UAV Remote Sensing with GIS for Predicting Rice Grain Protein

  • Sarkar, Tapash Kumar;Ryu, Chan-Seok;Kang, Ye-Seong;Kim, Seong-Heon;Jeon, Sae-Rom;Jang, Si-Hyeong;Park, Jun-Woo;Kim, Suk-Gu;Kim, Hyun-Jin
    • Journal of Biosystems Engineering
    • /
    • v.43 no.2
    • /
    • pp.148-159
    • /
    • 2018
  • Purpose: Unmanned air vehicle (UAV) remote sensing was applied to test various vegetation indices and make prediction models of protein content of rice for monitoring grain quality and proper management practice. Methods: Image acquisition was carried out by using NIR (Green, Red, NIR), RGB and RE (Blue, Green, Red-edge) camera mounted on UAV. Sampling was done synchronously at the geo-referenced points and GPS locations were recorded. Paddy samples were air-dried to 15% moisture content, and then dehulled and milled to 92% milling yield and measured the protein content by near-infrared spectroscopy. Results: Artificial neural network showed the better performance with $R^2$ (coefficient of determination) of 0.740, NSE (Nash-Sutcliffe model efficiency coefficient) of 0.733 and RMSE (root mean square error) of 0.187% considering all 54 samples than the models developed by PR (polynomial regression), SLR (simple linear regression), and PLSR (partial least square regression). PLSR calibration models showed almost similar result with PR as 0.663 ($R^2$) and 0.169% (RMSE) for cloud-free samples and 0.491 ($R^2$) and 0.217% (RMSE) for cloud-shadowed samples. However, the validation models performed poorly. This study revealed that there is a highly significant correlation between NDVI (normalized difference vegetation index) and protein content in rice. For the cloud-free samples, the SLR models showed $R^2=0.553$ and RMSE = 0.210%, and for cloud-shadowed samples showed 0.479 as $R^2$ and 0.225% as RMSE respectively. Conclusion: There is a significant correlation between spectral bands and grain protein content. Artificial neural networks have the strong advantages to fit the nonlinear problem when a sigmoid activation function is used in the hidden layer. Quantitatively, the neural network model obtained a higher precision result with a mean absolute relative error (MARE) of 2.18% and root mean square error (RMSE) of 0.187%.

Yield Prediction of Chinese Cabbage (Brassicaceae) Using Broadband Multispectral Imagery Mounted Unmanned Aerial System in the Air and Narrowband Hyperspectral Imagery on the Ground

  • Kang, Ye Seong;Ryu, Chan Seok;Kim, Seong Heon;Jun, Sae Rom;Jang, Si Hyeong;Park, Jun Woo;Sarkar, Tapash Kumar;Song, Hye young
    • Journal of Biosystems Engineering
    • /
    • v.43 no.2
    • /
    • pp.138-147
    • /
    • 2018
  • Purpose: A narrowband hyperspectral imaging sensor of high-dimensional spectral bands is advantageous for identifying the reflectance by selecting the significant spectral bands for predicting crop yield over the broadband multispectral imaging sensor for each wavelength range of the crop canopy. The images acquired by each imaging sensor were used to develop the models for predicting the Chinese cabbage yield. Methods: The models for predicting the Chinese cabbage (Brassica campestris L.) yield, with multispectral images based on unmanned aerial vehicle (UAV), were developed by simple linear regression (SLR) using vegetation indices, and forward stepwise multiple linear regression (MLR) using four spectral bands. The model with hyperspectral images based on the ground were developed using forward stepwise MLR from the significant spectral bands selected by dimension reduction methods based on a partial least squares regression (PLSR) model of high precision and accuracy. Results: The SLR model by the multispectral image cannot predict the yield well because of its low sensitivity in high fresh weight. Despite improved sensitivity in high fresh weight of the MLR model, its precision and accuracy was unsuitable for predicting the yield as its $R^2$ is 0.697, root-mean-square error (RMSE) is 1170 g/plant, relative error (RE) is 67.1%. When selecting the significant spectral bands for predicting the yield using hyperspectral images, the MLR model using four spectral bands show high precision and accuracy, with 0.891 for $R^2$, 616 g/plant for the RMSE, and 35.3% for the RE. Conclusions: Little difference was observed in the precision and accuracy of the PLSR model of 0.896 for $R^2$, 576.7 g/plant for the RMSE, and 33.1% for the RE, compared with the MLR model. If the multispectral imaging sensor composed of the significant spectral bands is produced, the crop yield of a wide area can be predicted using a UAV.