DOI QR코드

DOI QR Code

Using Data Mining Techniques to Predict Win-Loss in Korean Professional Baseball Games

데이터마이닝을 활용한 한국프로야구 승패예측모형 수립에 관한 연구

  • Oh, Younhak (Department of Systems Management Engineering, Sungkyunkwan University) ;
  • Kim, Han (Department of Systems Management Engineering, Sungkyunkwan University) ;
  • Yun, Jaesub (Department of Systems Management Engineering, Sungkyunkwan University) ;
  • Lee, Jong-Seok (Department of Systems Management Engineering, Sungkyunkwan University)
  • 오윤학 (성균관대학교 시스템경영공학과) ;
  • 김한 (성균관대학교 시스템경영공학과) ;
  • 윤재섭 (성균관대학교 시스템경영공학과) ;
  • 이종석 (성균관대학교 시스템경영공학과)
  • Received : 2013.11.27
  • Accepted : 2014.01.09
  • Published : 2014.02.15

Abstract

In this research, we employed various data mining techniques to build predictive models for win-loss prediction in Korean professional baseball games. The historical data containing information about players and teams was obtained from the official materials that are provided by the KBO website. Using the collected raw data, we additionally prepared two more types of dataset, which are in ratio and binary format respectively. Dividing away-team's records by the records of the corresponding home-team generated the ratio dataset, while the binary dataset was obtained by comparing the record values. We applied seven classification techniques to three (raw, ratio, and binary) datasets. The employed data mining techniques are decision tree, random forest, logistic regression, neural network, support vector machine, linear discriminant analysis, and quadratic discriminant analysis. Among 21(= 3 datasets${\times}$7 techniques) prediction scenarios, the most accurate model was obtained from the random forest technique based on the binary dataset, which prediction accuracy was 84.14%. It was also observed that using the ratio and the binary dataset helped to build better prediction models than using the raw data. From the capability of variable selection in decision tree, random forest, and stepwise logistic regression, we found that annual salary, earned run, strikeout, pitcher's winning percentage, and four balls are important winning factors of a game. This research is distinct from existing studies in that we used three different types of data and various data mining techniques for win-loss prediction in Korean professional baseball games.

Keywords

References

  1. Breiman, L. (2001), Random forests, Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324
  2. Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984), Classification and regression trees, Wadsworth, CA, USA.
  3. Burges, C. J. C. (1998), A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery, 2, 121-167. https://doi.org/10.1023/A:1009715923555
  4. Chea, J.-S., Cho, E.-H., and Eom, H.-J. (2010), Comparisons of the outcomes of statistical models applied to the prediction of post-season entry in Korean professional baseball, The Korean Journal of Measurement and Evaluation in Physical Education and Sport Science, 12(1), 33-48.
  5. Hong, C., Jung, M., and Lee, J. (2010), Prediction model analysis of 2010 South Africa world cup, Journal of the Korean data and information science society, 21(6), 1137-1146.
  6. Hong, S., Jung, K., and Chung, T. (2003), Win/Lose prediction system : Predicting baseball game results using a hybrid machine learning model, Journal of Korea Information Science Society : Computing Practices, 9(6), 693-698.
  7. Jensen, S. T., McShane, B. B., and Wyner, A. J. (2009), Hierarchical Bayesian modeling of hitting performance in baseball, Bayesian Analysis, 4(4), 631-652. https://doi.org/10.1214/09-BA424
  8. Jun, C.-H. (2012), Data Mining Techniques and Applications, Hannarae, Seoul, Korea.
  9. Kim, C. (2001), A win-loss predicting model by analyzing professional baseball game, Journal of Sport and Leisure Studies, 16, 807-819.
  10. Kim, D., Lee, S., and Kim, Y. (2007), Prediction for 2006 Germany world cup using Bradley-Terry model, The Korean journal of applied statistics, 20(2), 205-218. https://doi.org/10.5351/KJAS.2007.20.2.205
  11. Kim, J. H., Ro, G. T., Park, J. S., and Lee, W. H. (2007), The development of soccer game win-lost prediction model using neural network analysis : FIFA world cup 2006 Germany, Korean Journal of Sport Science, 18(4), 54-63. https://doi.org/10.24985/kjss.2007.18.4.54
  12. Kim, N.-K. and Park, H.-M. (2011), Predicting the score of a soccer match by use of a Markovian arrival process, IE Interfaces, 24(4), 323-329. https://doi.org/10.7232/IEIF.2011.24.4.323
  13. Koo, S., Kim, H., and Chang, S. (2009), A comparative study on win-loss prediction models for Korean professional basketball, Korean Journal of Sport Science, 20(4), 704-711. https://doi.org/10.24985/kjss.2009.20.4.704
  14. Korean Baseball Organization (2013), 2013 KBO Annual Report, Korean Baseball Organization, Seoul, Korea.
  15. Lee, D.-J. and Yang, W. M. (2004), Performance evaluations of professional baseball players using DEA/OERA, IE Interfaces, 17(4), 440-449.
  16. Lewis, M. M. (2004), Moneyball : The Art of Winning an Unfair Game, W. W. Norton and Company, NY, USA.
  17. Miljkovic, D., Gajic, L., Kovacevic, A., and Konjovic, Z. (2010), The use of data mining for basketball matches outcomes prediction, Proceedings of the 8th International Symposium on Intelligent Systems and Informatics, 309-312.
  18. Min, D. K. and Hyun, M. S. (2009), Prediction of a winner in PGA tournament using neural network, Journal of the Korean data and information science society, 20(6), 1119-1127.
  19. Null, B. (2009), Modeling baseball player ability with a nested Dirichlet distribution, Journal of Quantitative Analysis in Sports, 5(2), 1-36.
  20. Odachowski, K. and Grekow, J. (2013), Using bookmaker odds to predict the final result of football matches, Lecture Notes in Artificial Intelligence, 7828, 196-205.
  21. Oh, K.-M. and Lee, J.-T. (2003), A model study on salaries of Korean pro-baseball players using data mining, Journal of Korean Sociology of Sport, 16(2), 295-309.
  22. Seidman, C. (2002), MS SQL server2000 data mining (Technical Reference).
  23. Sung, H. and Chang, W. (2007), Forecasting the results of soccer matches using poisson model, IE Interfaces, 20(2), 133-141.