Browse > Article
http://dx.doi.org/10.22937/IJCSNS.2022.22.2.1

Stock News Dataset Quality Assessment by Evaluating the Data Distribution and the Sentiment Prediction  

Alasmari, Eman (The Faculty of Computing and Information Technology, King Abdulaziz University)
Hamdy, Mohamed (The Faculty of Computing and Information Technology, King Abdulaziz University)
Alyoubi, Khaled H. (The Faculty of Computing and Information Technology, King Abdulaziz University)
Alotaibi, Fahd Saleh (The Faculty of Computing and Information Technology, King Abdulaziz University)
Publication Information
International Journal of Computer Science & Network Security / v.22, no.2, 2022 , pp. 1-8 More about this Journal
Abstract
This work provides a reliable and classified stocks dataset merged with Saudi stock news. This dataset allows researchers to analyze and better understand the realities, impacts, and relationships between stock news and stock fluctuations. The data were collected from the Saudi stock market via the Corporate News (CN) and Historical Data Stocks (HDS) datasets. As their names suggest, CN contains news, and HDS provides information concerning how stock values change over time. Both datasets cover the period from 2011 to 2019, have 30,098 rows, and have 16 variables-four of which they share and 12 of which differ. Therefore, the combined dataset presented here includes 30,098 published news pieces and information about stock fluctuations across nine years. Stock news polarity has been interpreted in various ways by native Arabic speakers associated with the stock domain. Therefore, this polarity was categorized manually based on Arabic semantics. As the Saudi stock market massively contributes to the international economy, this dataset is essential for stock investors and analyzers. The dataset has been prepared for educational and scientific purposes, motivated by the scarcity of data describing the impact of Saudi stock news on stock activities. It will, therefore, be useful across many sectors, including stock market analytics, data mining, statistics, machine learning, and deep learning. The data evaluation is applied by testing the data distribution of the categories and the sentiment prediction-the data distribution over classes and sentiment prediction accuracy. The results show that the data distribution of the polarity over sectors is considered a balanced distribution. The NB model is developed to evaluate the data quality based on sentiment classification, proving the data reliability by achieving 68% accuracy. So, the data evaluation results ensure dataset reliability, readiness, and high quality for any usage.
Keywords
Stock Dataset; Stock Market News; News Impact; Stock Activities; Data Quality Assessment;
Citations & Related Records
Times Cited By KSCI : 4  (Citation Analysis)
연도 인용수 순위
1 Murphy, K. P, "Naive bayes classifiers," University of British Columbia, no. 18, vol. 60, pp. 1-8, 2006.
2 Stehman, S. V., & Foody, G. M, "Accuracy assessment," In the SAGE handbook of remote sensing, London: Sage, pp. 297-309, 2009.
3 Brownlee, J, "Machine learning algorithms from scratch with Python," Machine Learning Mastery, 2016.
4 A. Gaydhani, V. Doma, S. Kendre, and L. Bhagwat, "Detecting Hate Speech and Offensive Language on Twitter using Machine Learning: An N-gram and TFIDF based Approach," arXiv preprint arXiv:1809.08651, 2018.
5 "The official home of the Python Programming Language." [Online].
6 Available: https://www.python.org/.
7 Batini, C., Cappiello, C., Francalanci, C., & Maurino, A. (2009). Methodologies for data quality assessment and improvement. ACM computing surveys (CSUR), 41(3), 1-52.
8 X. Li, H. Xie, L. Chen, J. Wang, and X. Deng, "News impact on stock price return via sentiment analysis," Knowledge-Based Syst., 2014.
9 Kieras, D. E., & Butler, K. A, "Task Analysis and the Design of Functionality," The computer science and engineering handbook, vol. 23, 1401-1423, 1997.
10 "The Saudi Stock Market Tadawul." [Online]. Available: https://www.tadawul.com.sa/wps/portal/tadawul/home/.
11 Q. Al-Radaideh, A. Assaf, and E. Alnagi, "Predicting stock prices using data mining techniques," Int. Arab Conf. Inf. Technol., 2013.
12 M. C. Mariani, M. A. M. Bhuiyan, O. K. Tweneboah, M. P. Beccar-Varela, and I. Florescu, "Analysis of stock market data by using Dynamic Fourier and Wavelets techniques," Phys. A Stat. Mech. its Appl., vol. 537, p. 122785, 2020.   DOI
13 F. Jareno Cebrian, "The sensitivity of sectoral returns to real interest rates and inflation," Investig. Econ., 2006.
14 X. Ding, Y. Zhang, T. Liu, and J. Duan, "Deep learning for event-driven stock prediction," in IJCAI International Joint Conference on Artificial Intelligence, 2015.
15 I. Henriques and P. Sadorsky, "Oil prices and the stock prices of alternative energy companies," Energy Econ., vol. 30, no. 3, pp. 998-1010, 2008.   DOI
16 A. Badawi, A. AlQudah, and W. Rashideh, "Determinants of Foreign Portfolio Investment in Emerging Markets: Evidence from Saudi Stock Market," SSRN Electron. J., 2017.
17 I. A. Gelil, N. Howarth, and A. Lanza, "Growth, Investment and the Low-Carbon Transition: A View from Saudi Arabia," Kapsarc, no. August, pp. 1-20, 2017.
18 M. Alharbi, "The Reliance of the Saudi Economy and Adequacy of its Foreign Reserves with Reference to Oil Price Volatility: An Overview," Int. J. Bus. Adm. Stud., vol. 5, no. 6, pp. 329-339, 2019.   DOI
19 J. R. Pineiro-Chousa, M. A. Lopez-Cabarcos, and A. M. Perez-Pico, "Examining the influence of stock market variables on microblogging sentiment," J. Bus. Res., vol. 69, no. 6, pp. 2087-2092, 2016.   DOI
20 Huang, K., Lee, Y., and Wang, R. Quality Information and Knowledge. Prentice Hall, Upper Saddle River: N.J. 1999.
21 Kahn, B. K., Strong, D. M., and Wang, R. Y. Information Quality Benchmarks: Product and Service Performance. Commun. ACM, (2002).
22 A. Mittal and A. Goel, "Stock Prediction Using Twitter Sentiment Analysis," http://cs229.stanford.edu/proj2011/GoelMittal-StockMarketPredictionUsingTwitterSentimentAnalysis.pdf, 2012.