• Title/Summary/Keyword: Supervised prediction

Search Result 126, Processing Time 0.027 seconds

A Win/Lose prediction model of Korean professional baseball using machine learning technique

  • Seo, Yeong-Jin;Moon, Hyung-Woo;Woo, Yong-Tae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.2
    • /
    • pp.17-24
    • /
    • 2019
  • In this paper, we propose a new model for predicting effective Win/Loss in professional baseball game in Korea using machine learning technique. we used basic baseball data and Sabermetrics data, which are highly correlated with score to predict and we used the deep learning technique to learn based on supervised learning. The Drop-Out algorithm and the ReLu activation function In the trained neural network, the expected odds was calculated using the predictions of the team's expected scores and expected loss. The team with the higher expected rate of victory was predicted as the winning team. In order to verify the effectiveness of the proposed model, we compared the actual percentage of win, pythagorean expectation, and win percentage of the proposed model.

Cross-Project Pooling of Defects for Handling Class Imbalance

  • Catherine, J.M.;Djodilatchoumy, S
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.10
    • /
    • pp.11-16
    • /
    • 2022
  • Applying predictive analytics to predict software defects has improved the overall quality and decreased maintenance costs. Many supervised and unsupervised learning algorithms have been used for defect prediction on publicly available datasets. Most of these datasets suffer from an imbalance in the output classes. We study the impact of class imbalance in the defect datasets on the efficiency of the defect prediction model and propose a CPP method for handling imbalances in the dataset. The performance of the methods is evaluated using measures like Matthew's Correlation Coefficient (MCC), Recall, and Accuracy measures. The proposed sampling technique shows significant improvement in the efficiency of the classifier in predicting defects.

Mobile health service user characteristics analysis and churn prediction model development (모바일 헬스 서비스 사용자 특성 분석 및 이탈 예측 모델 개발)

  • Han, Jeong Hyeon;Lee, Joo Yeoun
    • Journal of the Korean Society of Systems Engineering
    • /
    • v.17 no.2
    • /
    • pp.98-105
    • /
    • 2021
  • As the average life expectancy is rising, the population is aging and the number of chronic diseases is increasing. This has increased the importance of healthy life and health management, and interest in mobile health services is on the rise thanks to the development of ICT(Information and communication technologies) and the smartphone use expansion. In order to meet these interests, many mobile services related to daily health are being launched in the market. Therefore, in this study, the characteristics of users who actually use mobile health services were analyzed and a predictive model applied with machine learning modeling was developed. As a result of the study, we developed a prediction model to which the decision tree and ensemble methods were applied. And it was found that the mobile health service users' continued use can be induced by providing features that require frequent visit, suggesting achievable activity missions, and guiding the sensor connection for user's activity measurement.

R-Trader: An Automatic Stock Trading System based on Reinforcement learning (R-Trader: 강화 학습에 기반한 자동 주식 거래 시스템)

  • 이재원;김성동;이종우;채진석
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.11
    • /
    • pp.785-794
    • /
    • 2002
  • Automatic stock trading systems should be able to solve various kinds of optimization problems such as market trend prediction, stock selection, and trading strategies, in a unified framework. But most of the previous trading systems based on supervised learning have a limit in the ultimate performance, because they are not mainly concerned in the integration of those subproblems. This paper proposes a stock trading system, called R-Trader, based on reinforcement teaming, regarding the process of stock price changes as Markov decision process (MDP). Reinforcement learning is suitable for Joint optimization of predictions and trading strategies. R-Trader adopts two popular reinforcement learning algorithms, temporal-difference (TD) and Q, for selecting stocks and optimizing other trading parameters respectively. Technical analysis is also adopted to devise the input features of the system and value functions are approximated by feedforward neural networks. Experimental results on the Korea stock market show that the proposed system outperforms the market average and also a simple trading system trained by supervised learning both in profit and risk management.

A Fusion Method of Co-training and Label Propagation for Prediction of Bank Telemarketing (은행 텔레마케팅 예측을 위한 레이블 전파와 협동 학습의 결합 방법)

  • Kim, Aleum;Cho, Sung-Bae
    • Journal of KIISE
    • /
    • v.44 no.7
    • /
    • pp.686-691
    • /
    • 2017
  • Telemarketing has become the center of marketing action of the industry in the information society. Recently, machine learning has emerged in many areas, especially, financial prediction. Financial data consists of lots of unlabeled data in most parts, and therefore, it is difficult for humans to perform their labeling. In this paper, we propose a fusion method of semi-supervised learning for automatic labeling of unlabeled data to predict telemarketing. Specifically, we integrate labeling results of label propagation and co-training with a decision tree. The data with lower reliabilities are removed, and the data are extracted that have consistent label from two labeling methods. After adding them to the training set, a decision tree is learned with all of them. To confirm the usefulness of the proposed method, we conduct the experiments with a real telemarketing dataset in a Portugal bank. Accuracy of the proposed method is 83.39%, which is 1.82% higher than that of the conventional method, and precision of the proposed method is 19.37%, which is 2.67% higher than that of the conventional method. As a result, we have shown that the proposed method has a better performance as assessed by the t-test.

Predicting Early Retirees Using Personality Data (인성 데이터를 활용한 조기 퇴사자 예측)

  • Kim, Young Park;Kim, Hyoung Joong
    • Journal of Digital Contents Society
    • /
    • v.19 no.1
    • /
    • pp.141-147
    • /
    • 2018
  • This study analyzed the early retired employees who stayed in company no longer than 3 years based on a certain company's personality evaluation result data. The predicted model was analyzed by dividing into two categories; the manufacture group and the R&D group. Independent variables were selected according to the stepwise method. A logistic regression model was selected as a prediction model among various supervised learning methods, and trained through cross-validation to prevent over-fitting or under-fitting. The accuracy of the two groups were confirmed by the confusion matrix. The most influential factor for early retirement in the manufacture group was revealed as "immersion," and for the R&D group appeared as "antisocial." In the past, people concentrated on collecting data by questionnaire and identifying factors that are highly related to the retirement, but this study suggests a sustainable early retirement prediction model in the future by analyzing the tangible outcome of the recruitment process.

Improving the prediction accuracy for LDL-cholesterol based on semi-supervised learning (준지도학습 기반 LDL-콜레스테롤 예측의 정확도 개선)

  • Yang, Su-Bhin;Kim, Min-Tae;Kwon, Su-Bin;Woo, Na-Hyun;Kim, Hak-Jae;Jeong, Tai-Kyeong;Lee, Sung-Ju
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.05a
    • /
    • pp.553-556
    • /
    • 2022
  • 이상지질혈증의 발병에 대한 조기 진단 및 관리하는 것은 중요한 문제이다. 이상지질혈증의 진단은 혈액계측 정보 중에서 네 가지 LDL, HDL, TG, 그리고 TC를 이용하여 진단하며, 이상지질혈증 관리를 위해서는 LDL을 추정하는 것이 중요하다. 본 논문에서는 나이, 성별, 그리고 BMI와 같은 신체계측 정보를 학습하여 LDL-콜레스테롤을 예측하기 위한 준지도학습(Semi-supervised learning) 기반 기계학습 방법을 제안한다. 제안 방법은 얕은 학습(Shallow Learning)기반의 MLP(Multi-Layer Perceptron)을 이용하고, 이상지질혈증 진단인자간의 상관관계를 고려하여 신체계측 정보로 예측된 HDL, TG, 그리고 TC을 이용하여 일반적인 기계학습을 이용한 예측방법의 정확도를 개선한다. 즉, 제안방법은 신체계측 정보를 이용하여 혈액계측 정보의 LDL, HDL, TG, 그리고 TC을 각각 예측하고, 신체계측에 혈액계측의 예측 정보를 추가하여 학습한 준지도학습 기반 얕은 네트워크를 설계한다. 실험결과, HDL, TG, 그리고 TC의 혈액예측 정보를 이용한 준지도학습 기반 LDL 예측 정확도는 71.4%로 신체계측 정보만을 이용한 예측 방법의 67.0% 보다 약 4.4% 개선할 수 있음을 확인한다.

Analysis and Prediction of Energy Consumption Using Supervised Machine Learning Techniques: A Study of Libyan Electricity Company Data

  • Ashraf Mohammed Abusida;Aybaba Hancerliogullari
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.3
    • /
    • pp.10-16
    • /
    • 2023
  • The ever-increasing amount of data generated by various industries and systems has led to the development of data mining techniques as a means to extract valuable insights and knowledge from such data. The electrical energy industry is no exception, with the large amounts of data generated by SCADA systems. This study focuses on the analysis of historical data recorded in the SCADA database of the Libyan Electricity Company. The database, spanned from January 1st, 2013, to December 31st, 2022, contains records of daily date and hour, energy production, temperature, humidity, wind speed, and energy consumption levels. The data was pre-processed and analyzed using the WEKA tool and the Apriori algorithm, a supervised machine learning technique. The aim of the study was to extract association rules that would assist decision-makers in making informed decisions with greater efficiency and reduced costs. The results obtained from the study were evaluated in terms of accuracy and production time, and the conclusion of the study shows that the results are promising and encouraging for future use in the Libyan Electricity Company. The study highlights the importance of data mining and the benefits of utilizing machine learning technology in decision-making processes.

Classification Methods for Automated Prediction of Power Load Patterns (전력 부하 패턴 자동 예측을 위한 분류 기법)

  • Minghao, Piao;Park, Jin-Hyung;Lee, Heon-Gyu;Ryu, Keun-Ho
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2008.06c
    • /
    • pp.26-30
    • /
    • 2008
  • Currently an automated methodology based on data mining techniques is presented for the prediction of customer load patterns in long duration load profiles. The proposed our approach consists of three stages: (i) data pre-processing: noise or outlier is removed and the continuous attribute-valued features are transformed to discrete values, (ii) cluster analysis: k-means clustering is used to create load pattern classes and the representative load profiles for each class and (iii) classification: we evaluated several supervised learning methods in order to select a suitable prediction method. According to the proposed methodology, power load measured from AMR (automatic meter reading) system, as well as customer indexes, were used as inputs for clustering. The output of clustering was the classification of representative load profiles (or classes). In order to evaluate the result of forecasting load patterns, the several classification methods were applied on a set of high voltage customers of the Korea power system and derived class labels from clustering and other features are used as input to produce classifiers. Lastly, the result of our experiments was presented.

  • PDF

A Study on the Blockchain-Based Insurance Fraud Prediction Model Using Machine Learning (기계학습을 이용한 블록체인 기반의 보험사기 예측 모델 연구)

  • Lee, YongJoo
    • Journal of Convergence for Information Technology
    • /
    • v.11 no.6
    • /
    • pp.270-281
    • /
    • 2021
  • With the development of information technology, the size of insurance fraud is increasing rapidly every year, and the method is being organized and advanced in conspiracy. Although various forms of prediction models are being studied to predict and detect this, insurance-related information is highly sensitive, which poses a high risk of sharing and access and has many legal or technical constraints. In this paper, we propose a machine learning insurance fraud prediction model based on blockchain, one of the most popular technologies with the recent advent of the Fourth Industrial Revolution. We utilize blockchain technology to realize a safe and trusted insurance information sharing system, apply the theory of social relationship analysis for more efficient and accurate fraud prediction, and propose machine learning fraud prediction patterns in four stages. Claims with high probability of fraud have the effect of being detected at a higher prediction rate at an earlier stage, and claims with low probability are applied differentially for post-reference management. The core mechanism of the proposed model has been verified by constructing an Ethereum local network, requiring more sophisticated performance evaluations in the future.