Browse > Article
http://dx.doi.org/10.13088/jiis.2018.24.2.001

Online news-based stock price forecasting considering homogeneity in the industrial sector  

Seong, Nohyoon (KAIST College of Business, Korea Advanced Institute of Science and Technology (KAIST))
Nam, Kihwan (College of Business, Hanyang University)
Publication Information
Journal of Intelligence and Information Systems / v.24, no.2, 2018 , pp. 1-19 More about this Journal
Abstract
Since stock movements forecasting is an important issue both academically and practically, studies related to stock price prediction have been actively conducted. The stock price forecasting research is classified into structured data and unstructured data, and it is divided into technical analysis, fundamental analysis and media effect analysis in detail. In the big data era, research on stock price prediction combining big data is actively underway. Based on a large number of data, stock prediction research mainly focuses on machine learning techniques. Especially, research methods that combine the effects of media are attracting attention recently, among which researches that analyze online news and utilize online news to forecast stock prices are becoming main. Previous studies predicting stock prices through online news are mostly sentiment analysis of news, making different corpus for each company, and making a dictionary that predicts stock prices by recording responses according to the past stock price. Therefore, existing studies have examined the impact of online news on individual companies. For example, stock movements of Samsung Electronics are predicted with only online news of Samsung Electronics. In addition, a method of considering influences among highly relevant companies has also been studied recently. For example, stock movements of Samsung Electronics are predicted with news of Samsung Electronics and a highly related company like LG Electronics.These previous studies examine the effects of news of industrial sector with homogeneity on the individual company. In the previous studies, homogeneous industries are classified according to the Global Industrial Classification Standard. In other words, the existing studies were analyzed under the assumption that industries divided into Global Industrial Classification Standard have homogeneity. However, existing studies have limitations in that they do not take into account influential companies with high relevance or reflect the existence of heterogeneity within the same Global Industrial Classification Standard sectors. As a result of our examining the various sectors, it can be seen that there are sectors that show the industrial sectors are not a homogeneous group. To overcome these limitations of existing studies that do not reflect heterogeneity, our study suggests a methodology that reflects the heterogeneous effects of the industrial sector that affect the stock price by applying k-means clustering. Multiple Kernel Learning is mainly used to integrate data with various characteristics. Multiple Kernel Learning has several kernels, each of which receives and predicts different data. To incorporate effects of target firm and its relevant firms simultaneously, we used Multiple Kernel Learning. Each kernel was assigned to predict stock prices with variables of financial news of the industrial group divided by the target firm, K-means cluster analysis. In order to prove that the suggested methodology is appropriate, experiments were conducted through three years of online news and stock prices. The results of this study are as follows. (1) We confirmed that the information of the industrial sectors related to target company also contains meaningful information to predict stock movements of target company and confirmed that machine learning algorithm has better predictive power when considering the news of the relevant companies and target company's news together. (2) It is important to predict stock movements with varying number of clusters according to the level of homogeneity in the industrial sector. In other words, when stock prices are homogeneous in industrial sectors, it is important to use relational effect at the level of industry group without analyzing clusters or to use it in small number of clusters. When the stock price is heterogeneous in industry group, it is important to cluster them into groups. This study has a contribution that we testified firms classified as Global Industrial Classification Standard have heterogeneity and suggested it is necessary to define the relevance through machine learning and statistical analysis methodology rather than simply defining it in the Global Industrial Classification Standard. It has also contribution that we proved the efficiency of the prediction model reflecting heterogeneity.
Keywords
Stock prediction; Text Mining; Machine Learning; Multiple Kernel Learning; Clustering;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Aiolli, F., and M. Donini, "EasyMKL: a scalable multiple kernel learning algorithm," Neurocomputing, Vol. 169, (2015), 215-224.   DOI
2 Arthur, D. and S. Vassilvitskii, "k-means++: the advantages of careful seeding". Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics Philadelphia, PA, USA. (2007), 1027-1035.
3 Cherif, A., H. Cardot, and R. Bone, "SOM time series clustering and prediction with recurrent neural networks," Neurocomputing, Vol. 74, No. 11(2011), 1936-1944.   DOI
4 Deng, S., T. Mitsubuchi, K. Shioda, T. Shimada, and A. Sakurai, "Combining technical analysis with sentiment analysis for stock price prediction," In Dependable, Autonomic and Secure Computing (DASC), 2011 IEEE Ninth International Conference on (2011), 800-807.
5 Ester, M., H. P. Kriegel, J. Sander, and X. Xu, "A density-based algorithm for discovering clusters in large spatial databases with noise," In Kdd, Vol. 96, No. 34(1996), 226-231.
6 Fung, G. P. C., J. X. Yu, and H. Lu, "The Predicting Power of Textual Information on Financial Markets," IEEE Intelligent Informatics Bulletin, Vol. 5, No. 1(2005), 1-10.
7 Gidofalvi, G., and C. Elkan, "Using news articles to predict stock price movements," Department of Computer Science and Engineering, University of California, San Diego, (2001).
8 Groth, S. S., and J. Muntermann, "An intraday market risk management approach based on textual analysis," Decision Support Systems, Vol. 50, No. 4(2011), 680-691.   DOI
9 Hagenau, M., M. Liebmann, and D. Neumann, "Automated news reading: Stock price prediction based on financial news using context-capturing features," Decision Support Systems, Vol. 55, No. 3(2013), 685-697.   DOI
10 Jain, A. K., "Data clustering: 50 years beyond K-means," Pattern recognition letters, Vol. 31, No. 8(2010), 651-666.   DOI
11 Jain, A., S. V. Vishwanathan, and M. Varma, "SPF-GMKL: generalized multiple kernel learning with a million kernels," In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, (2012), 750-758.
12 Lee, D. J., J. H. Yeon, I. B. Hwang, and S. G. Lee, "KKMA: a tool for utilizing Sejong corpus based on relational database," Journal of KIISE: Computing Practices and Letters, Vol. 16, No. 11(2010), 1046-1050.
13 Jeong, J. S., D. S. Kim, and J. W. Kim, "Influence analysis of Internet buzz to corporate performance: Individual stock price prediction using sentiment analysis of online news", Journal of Intelligence and Information Systems, Vol. 21, No. 4 (2015), 37-51.   DOI
14 Kim, Y.-S., N.-G. Kim, and S.-R. Jeong, "Stock-Index Invest Model Using News Big Data Opinion Mining", Journal of Intelligence and Information Systems, Vol. 18, No. 2(2012), 143-156.   DOI
15 Lazarsfeld, P.F. and Henry, N.W., "Latent structure analysis", Boston: Houghton Miffli, (1968)
16 Lee, M. and H. J. Lee, "Stock Price Prediction by Utilizing Category Neutral Terms: Text Mining Approach", Journal of Intelligence and Information Systems, Vol. 23, No. 2(2017), 123-138.   DOI
17 Li, Q., T. Wang, P. Li, L. Liu, Q. Gong, and Y. Chen, "The effect of news and public mood on stock movements," Information Sciences, Vol. 278, (2014), 826-840.   DOI
18 Li, X., C. Wang, J. Dong, and F. Wang, "Improving stock market prediction by integrating both market news and stock prices," Database and Expert Systems Applications, Lecture Notes in Computer Science, Vol. 6861 (2011), 279-293.
19 MacQueen, J., "Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability," Vol. 1, No. 14(1967) 281-297.
20 Mittermayer, M., "Forecasting intraday stock price trends with text mining techniques," Proceedings of the 37th Annual Hawaii International Conference on System Sciences, (2004), 1-10.
21 Rousseeuw, P. J., "Silhouettes: a graphical aid to the interpretation and validation of cluster analysis," Journal of computational and applied mathematics, Vol. 20 (1987), 53-65.   DOI
22 Motter, A. E., C. S. Zhou, and J. Kurths, "Enhancing complex-network synchronization," EPL(Europhysics Letters), Vol. 69, No. 3 (2005), 334.   DOI
23 Nassirtoussi, A.K., T.Y. Wah, S.R. Aghabozorgi, and D.N.C. Ling, "Text mining for market prediction: a systematic review," Expert Systems with Applications, Vol. 41, No. 16(2014), 7653-7670.   DOI
24 Ng, R. T., and J. Han, "Efficient and effective clustering method for spatial data mining," In Proceedings of VLDB (1994), 144-155.
25 Schumaker, R. P., and H. Chen, "A quantitative stock prediction system based on financial news," Information Processing & Management, Vol. 45, No. 5(2009), 571-583.   DOI
26 Shynkevich, Y., T. M. McGinnity, S. A. Coleman, and A. Belatreche, "Forecasting movements of health-care stock prices based on different categories of news articles using multiple kernel learning," Decision Support Systems, Vol. 85, (2016), 74-83.   DOI
27 Sun, Z., N. Ampornpunt, M. Varma, and S. Vishwanathan, "Multiple kernel learning and the SMO algorithm," In Advances in neural information processing systems, (2010), 2361-2369.
28 Wang, F., L. Liu, and C. Dou, "Stock market volatility prediction: a service-oriented multi-kernel learning approach," 2012 IEEE Ninth International Conference on In Services Computing (SCC) (2012), 49-56.
29 Yeh, C.-Y., C.-W. Huang, and S.-J. Lee, A multiple-kernel support vector regression approach for stock market price forecasting, Expert Systems with Applications, Vol. 38, No. 3(2011), 2177-2186.   DOI
30 Zhai, Y., A. Hsu, and S. K. Halgamuge, "Combining news and technical indicators in daily stock price trends prediction," In Proceedings of the 4th international symposium on neural networks: advances in neural networks, Part III (2007), 1087-1096.
31 Zhang, T., R. Ramakrishnan, and M. Livny, "BIRCH: an efficient data clustering method for very large databases," In ACM Sigmod Record Vol. 25, No. 2(1996), 103-114.