Browse > Article
http://dx.doi.org/10.15207/JKCS.2021.12.7.169

A Study On Predicting Stock Prices Of Hallyu Content Companies Using Two-Stage k-Means Clustering  

Kim, Jeong-Woo (Dept. of Economics, Gangneung Wonju National University)
Publication Information
Journal of the Korea Convergence Society / v.12, no.7, 2021 , pp. 169-179 More about this Journal
Abstract
This study shows that the two-stage k-means clustering method can improve prediction performance by predicting the stock price, To this end, this study introduces the two-stage k-means clustering algorithm and tests the prediction performance through comparison with various machine learning techniques. It selects the cluster close to the prediction target obtained from the k-means clustering, and reapplies the k-means clustering method to the cluster to search for a cluster closer to the actual value. As a result, the predicted value of this method is shown to be closer to the actual stock price than the predicted values of other machine learning techniques. Furthermore, it shows a relatively stable predicted value despite the use of a relatively small cluster. Accordingly, this method can simultaneously improve the accuracy and stability of prediction, and it can be considered as the new clustering method useful for small data. In the future, developing the two-stage k-means clustering is required for the large-scale data application.
Keywords
Clustering; Machine learning; Overfitting; Prediction; Time-series data;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 N. Nidheesh, K. A. Nazeer & P. Ameer. (2017). An enhanced deterministic K-Means clustering algorithm for cancer subtype prediction from gene expression data. Computers in biology and medicine, 91, 213-221. DOI : 10.1016/j.compbiomed.2017.10.014   DOI
2 M. H. Huh. (2000). Double k - means clustering. The Korean Journal of Applied Statistics, 13(2), 343-352.
3 T. Hastie, R. Tibshirani & J. Friedman. (2009). The Elements Of Statistical Learning: Data Mining, Inference, And Prediction. International Statistical Review, 77(3), 463-482. DOI : 10.1111/j.1751-5823.2009.00095_18.x   DOI
4 S. Y. Kwon, Y. W. Ko & M. H. Hwang. (2010). The Market Reaction to Management Earnings Forecasts and its Determinants. Korean Management Review, 39(4), 995-1021.
5 J. Y. Heo & J. Y. Yang. (2015). SVM based Stock Price Forecasting Using Financial Statements. KIISE Transactions on Computing Practices, 21(3), 167-172. DOI : 10.5626/KTCP.2015.21.3.167   DOI
6 J. B. Kim. (2012. 9. 17). Venture start-up fever, light and dark. Electronic Times News.
7 H. Y. Jung. (2020. 6. 4). Bio-Pharmaceutical Companies, Individuals Still Invest. ChosunBiz.
8 W. S. Jung. (2020. 4. 20). Young people in their 20s with the highest percentage of "invest in stocks"... Negative bankbook debt growth rate is also 75% "highest". The Kyunghyang Shinmun.
9 S. S. Kim, D. W. Nam, H. Jo & S. H. Kim. (2012). A study on the relation of web news and stock price. Journal of Information Technology Service, 11(3), 191-203. DOI : 10.9716/KITS.2012.11.3.191.   DOI
10 M. Bank, M. Larch & G. Peter. (2011). Google search volume and its influence on liquidity and returns of German stocks. Finance Markets and Portfolio Management, 25(3), 239-264. DOI : 10.1007/s11408-011-0165-y   DOI
11 F. Cai, N. A. Le-Khac & M. T. Kechadi. (2012). Clustering Approaches for Financial Data Analysis. in Conference: 8th International conference on Data Mining. Nevada : USA.
12 I. C. Park, O. J. Kwon & T. Y. Kim. (2009). KOSPI directivity forecasting by time series model. Journal of the Korean Data And Information Science Society, 20(6), 991-998.
13 S. Kim & J. A Kim. (2009). Analyzing financial time series data using the GARCH model. Journal of the Korean Data And Information Science Society, 20(3), 475-483.
14 H. J. Song & S. J. Lee. (2018). A study on the optimal trading frequency pattern and forecasting timing in real time stock trading using deep learning: focused on KOSDAQ. The Journal of Information Systems, 27(3), 123-140. DOI : 10.5859/KAIS.2018.27.3.123   DOI
15 D. S. Song. (2002). An empirical study on the effects of accounting information in kosdaq firms on the stock price. Asia Pacific Journal of Samall Business, 24(4), 79-98.
16 R. P. D. Nath, H. J. Lee, N. K. Chowdhury & J. W. Chang. (2010). Modified K-means clustering for travel time prediction based on historical traffic data. In International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, Springer. (pp. 511-521). DOI : 10.1007/978-3-642-15387-7_55   DOI
17 S. Shin, H. J. Lee, & J. J. Ahn. (2018). A study on initial price change prediction of IPO shares using non-financial information. Journal of the Korean Data And Information Science Society, 29(2), 425-439..   DOI
18 M. A. Tayal & M. M. Raghuwanshi. (2010). Review on various clustering methods for the image data. Journal of Emerging Trends in Computing and Information Sciences, 2, 34-38.
19 J. MacQueen. (1967). Some methods for classification and analysis of multivariate observations. in Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. (Vol. 1, No. 14, pp. 281-297). Oakland : USA.
20 O. Oyelade, O. O. Oladipupo & I. Obagbuwa. (2010). Application of k Means Clustering algorithm for prediction of Students Academic Performance. International Journal of Computer Science and Information Security, 7(1), 292-295.
21 N. Chayangkoon & A. Srivihok. (2016). Two Step Clustering Model for K-Means Algorithm. Proceedings of the Fifth International Conference on Network, Communication and Computing-ICNCC. DOI : 10.1145/3033288.3033347   DOI
22 J. M. Park, G. H. Kim & N. C. Kang. (2015). The Market Reaction on the Corrective Disclosure of Management Earnings Forecasts. Korean International Accounting Review, 64, 183-200.
23 M. J. Kim, J. H. Ryu, D. H. Cha & M. K. Sim. (2020). Stock price prediction using sentiment analysis: from "stock discussion room in naver. The Journal of Society for e-Business Studies, 25(4), 61-75. DOI : 10.7838/JSEBS.2020.25.4.061.   DOI
24 L. Nanetti, L. Cerliani, V. Gazzola, R. Renken & C. Keysers. (2009). Group analyses of connectivity-based cortical parcellation using repeated k-means clustering. Neuroimage, 47( 4), 1666-1677. DOI : 10.1016/j.neuroimage.2009.06.014   DOI
25 S. A. Fahad & M. M. Alam. (2016). A modified K-means algorithm for big data clustering. International Journal of Science, Engineering and Computer Technology, 6(4), 129-132.
26 H. Ismkhan. (2018). I-k-means?+: An Iterative Clustering Algorithm Based on an Enhanced Version of the k-kmeans. Pattern Recognition, 79, 402.-413. DOI : 10.1016/j.patcog.2018.02.015   DOI
27 R. Salman, V. Kecman, Q. Li, R. Strack & E. Test. (2011). Two-Stage Clustering with k-Means Algorithm. Communications in Computer and Information Science, 162, 110-122. DOI : 10.1007/978-3-642-21937-5_11   DOI
28 A. Singh, A. Yadav & A. Rana. (2013). K-means with Three different Distance Metrics. International Journal of Computer Applications, 67(10), 13-17. DOI : 10.5120/11430-6785   DOI
29 E. Xing, M. Jordan, S. J. Russell & A. Ng. (2002). Distance metric learning with application to clustering with side-information. Advances in neural information processing systems, 15, 521-528. DOI:10.5120/11430-6785
30 V. Cherkassky & Y. Ma. (2004). Practical selection of SVM parameters and noise estimation for SVM regression. Neural networks, 17(1), 113-126. DOI : 10.1016/S0893-6080(03)00169-2   DOI
31 Y. C. Lee. (2011). Clustering-based Performance Prediction Model Using Technology Rating Data. Journal of The Korean Data Analysis Society, 13(3), 1471-1482.
32 W. Chen, S. Chen, H. Zhang & T. Wu (2017). A hybrid prediction model for type 2 diabetes using K-means and decision tree. In 2017 8th IEEE International Conference on Software Engineering and Service Science (ICSESS), IEEE. (pp. 386-390). DOI : 10.1109/ICSESS.2017.8342938   DOI
33 A. Jamal, A. Handayani, A. Septiandri, E. Ripmiatin & Y. Effendi. (2018). Dimensionality reduction using pca and k-means clustering for breast cancer prediction. Lontar Komput. J. Ilm. Teknol. Inf, 9(3), 192-201. DOI : 10.24843/LKJITI.2018.v09.i03.p08   DOI
34 K. Benmouiza & A. Cheknane. (2013). Forecasting hourly global solar radiation using hybrid k-means and nonlinear autoregressive neural network models. Energy Conversion and Management, 75, 561-569. DOI : 10.1016/j.enconman.2013.07.003   DOI
35 D. Y. Kim, J. W. Park & J. H. Choi. (2014). A comparative study between stock price prediction models using sentiment analysis and machine learning based on SNS and news articles. Journal of Information Technology Service, 13(3), 221-233. DOI : 10.9716/KITS.2014.13.3.221.   DOI
36 A. J. O'connor. (2013). The power of popularity: An empirical study of the relationship between social media fan counts and brand company stock prices. Social Science Computer Review, 31(2), 229-235.   DOI
37 R. Shinde, S. Arjun, P. Patil & J. Waghmare. (2015). An intelligent heart disease prediction system using k-means clustering and Naive Bayes algorithm. International Journal of Computer Science and Information Technologies, 6(1), 637-639.