Browse > Article
http://dx.doi.org/10.7232/JKIIE.2014.40.1.008

Using Data Mining Techniques to Predict Win-Loss in Korean Professional Baseball Games  

Oh, Younhak (Department of Systems Management Engineering, Sungkyunkwan University)
Kim, Han (Department of Systems Management Engineering, Sungkyunkwan University)
Yun, Jaesub (Department of Systems Management Engineering, Sungkyunkwan University)
Lee, Jong-Seok (Department of Systems Management Engineering, Sungkyunkwan University)
Publication Information
Journal of Korean Institute of Industrial Engineers / v.40, no.1, 2014 , pp. 8-17 More about this Journal
Abstract
In this research, we employed various data mining techniques to build predictive models for win-loss prediction in Korean professional baseball games. The historical data containing information about players and teams was obtained from the official materials that are provided by the KBO website. Using the collected raw data, we additionally prepared two more types of dataset, which are in ratio and binary format respectively. Dividing away-team's records by the records of the corresponding home-team generated the ratio dataset, while the binary dataset was obtained by comparing the record values. We applied seven classification techniques to three (raw, ratio, and binary) datasets. The employed data mining techniques are decision tree, random forest, logistic regression, neural network, support vector machine, linear discriminant analysis, and quadratic discriminant analysis. Among 21(= 3 datasets${\times}$7 techniques) prediction scenarios, the most accurate model was obtained from the random forest technique based on the binary dataset, which prediction accuracy was 84.14%. It was also observed that using the ratio and the binary dataset helped to build better prediction models than using the raw data. From the capability of variable selection in decision tree, random forest, and stepwise logistic regression, we found that annual salary, earned run, strikeout, pitcher's winning percentage, and four balls are important winning factors of a game. This research is distinct from existing studies in that we used three different types of data and various data mining techniques for win-loss prediction in Korean professional baseball games.
Keywords
Professional Baseball; Win-Loss Prediction; Winning Factors; Data Mining; Classification Techniques;
Citations & Related Records
Times Cited By KSCI : 7  (Citation Analysis)
연도 인용수 순위
1 Kim, D., Lee, S., and Kim, Y. (2007), Prediction for 2006 Germany world cup using Bradley-Terry model, The Korean journal of applied statistics, 20(2), 205-218.   과학기술학회마을   DOI
2 Kim, J. H., Ro, G. T., Park, J. S., and Lee, W. H. (2007), The development of soccer game win-lost prediction model using neural network analysis : FIFA world cup 2006 Germany, Korean Journal of Sport Science, 18(4), 54-63.   DOI
3 Kim, N.-K. and Park, H.-M. (2011), Predicting the score of a soccer match by use of a Markovian arrival process, IE Interfaces, 24(4), 323-329.   과학기술학회마을   DOI
4 Koo, S., Kim, H., and Chang, S. (2009), A comparative study on win-loss prediction models for Korean professional basketball, Korean Journal of Sport Science, 20(4), 704-711.   DOI
5 Korean Baseball Organization (2013), 2013 KBO Annual Report, Korean Baseball Organization, Seoul, Korea.
6 Lee, D.-J. and Yang, W. M. (2004), Performance evaluations of professional baseball players using DEA/OERA, IE Interfaces, 17(4), 440-449.   과학기술학회마을
7 Lewis, M. M. (2004), Moneyball : The Art of Winning an Unfair Game, W. W. Norton and Company, NY, USA.
8 Miljkovic, D., Gajic, L., Kovacevic, A., and Konjovic, Z. (2010), The use of data mining for basketball matches outcomes prediction, Proceedings of the 8th International Symposium on Intelligent Systems and Informatics, 309-312.
9 Min, D. K. and Hyun, M. S. (2009), Prediction of a winner in PGA tournament using neural network, Journal of the Korean data and information science society, 20(6), 1119-1127.   과학기술학회마을
10 Null, B. (2009), Modeling baseball player ability with a nested Dirichlet distribution, Journal of Quantitative Analysis in Sports, 5(2), 1-36.
11 Odachowski, K. and Grekow, J. (2013), Using bookmaker odds to predict the final result of football matches, Lecture Notes in Artificial Intelligence, 7828, 196-205.
12 Oh, K.-M. and Lee, J.-T. (2003), A model study on salaries of Korean pro-baseball players using data mining, Journal of Korean Sociology of Sport, 16(2), 295-309.
13 Seidman, C. (2002), MS SQL server2000 data mining (Technical Reference).
14 Sung, H. and Chang, W. (2007), Forecasting the results of soccer matches using poisson model, IE Interfaces, 20(2), 133-141.   과학기술학회마을
15 Chea, J.-S., Cho, E.-H., and Eom, H.-J. (2010), Comparisons of the outcomes of statistical models applied to the prediction of post-season entry in Korean professional baseball, The Korean Journal of Measurement and Evaluation in Physical Education and Sport Science, 12(1), 33-48.
16 Breiman, L. (2001), Random forests, Machine Learning, 45(1), 5-32.   DOI   ScienceOn
17 Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984), Classification and regression trees, Wadsworth, CA, USA.
18 Burges, C. J. C. (1998), A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery, 2, 121-167.   DOI   ScienceOn
19 Hong, C., Jung, M., and Lee, J. (2010), Prediction model analysis of 2010 South Africa world cup, Journal of the Korean data and information science society, 21(6), 1137-1146.   과학기술학회마을
20 Hong, S., Jung, K., and Chung, T. (2003), Win/Lose prediction system : Predicting baseball game results using a hybrid machine learning model, Journal of Korea Information Science Society : Computing Practices, 9(6), 693-698.   과학기술학회마을
21 Jensen, S. T., McShane, B. B., and Wyner, A. J. (2009), Hierarchical Bayesian modeling of hitting performance in baseball, Bayesian Analysis, 4(4), 631-652.   DOI
22 Jun, C.-H. (2012), Data Mining Techniques and Applications, Hannarae, Seoul, Korea.
23 Kim, C. (2001), A win-loss predicting model by analyzing professional baseball game, Journal of Sport and Leisure Studies, 16, 807-819.