DOI QR코드

DOI QR Code

Applications of the Text Mining Approach to Online Financial Information

  • Received : 2022.02.25
  • Accepted : 2022.08.16
  • Published : 2022.12.31

Abstract

With the development of deep learning techniques, text mining is producing breakthrough performance improvements, promising future applications, and practical use cases across many fields. Likewise, even though several attempts have been made in the field of financial information, few cases apply the current technological trends. Recently, companies and government agencies have attempted to conduct research and apply text mining in the field of financial information. First, in this study, we investigate various works using text mining to show what studies have been conducted in the financial sector. Second, to broaden the view of financial application, we provide a description of several text mining techniques that can be used in the field of financial information and summarize various paradigms in which these technologies can be applied. Third, we also provide practical cases for applying the latest text mining techniques in the field of financial information to provide more tangible guidance for those who will use text mining techniques in finance. Lastly, we propose potential future research topics in the field of financial information and present the research methods and utilization plans. This study can motivate researchers studying financial issues to use text mining techniques to gain new insights and improve their work from the rich information hidden in text data.

Keywords

Acknowledgement

This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2021S1A3A2A02089039).

References

  1. Accenture, C. (2014). More Bang for the Budget : Automating Budget Processes for Government Efficiency, Retrieved from https://accntu.re/3u04U23
  2. Adhikari, A., Ram, A., Tang, R., and Lin, J. (2019). Docbert: Bert for document classification. arXiv preprint arXiv:1904.08398.
  3. Anand, G. S., Kuriakose, J., Sharma, S., and Guha, D. (2020, 4-7 Nov. 2020). Deep learning for information extraction in finance documents: Corporate loan operations. 2020 11th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON).
  4. Anthonisse, J. M. (1971). The rush in a directed graph. Stichting Mathematisch Centrum. Mathematische Besliskunde (BN 9/71).
  5. Aziz, S., Dowling, M., Hammami, H., and Piepenbrink, A. (2022). Machine learning in finance: A topic modeling approach. European Financial Management, 28(3), 744-770.
  6. Beauchamp, M. A. (1965). An improved index of centrality. Behavioral Science, 10(2), 161-163. https://doi.org/10.1002/bs.3830100205
  7. Bengio, Y., Ducharme, R., Vincent, P., and Jauvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3, 1137-1155.
  8. Bennani, H. (2018). The art of central banks' forward guidance at the zero lower bound [La pratique des indications prospectives des banques centrales dans le contexte de la borne du zero sur les taux d'interet]. Revue economique, 69(1), 111-137. https://doi.org/10.3917/reco.pr2.0111
  9. Binette, A., and Tchebotarev, D. (2019). Canada's Monetary Policy Report: If Text Could Speak, What Would It Say?, Retrieved from https://bit.ly/2QvMxUT
  10. Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993-1022.
  11. Bruno, G. (2017). Central bank communications: Information extraction and semantic analysis. Big Data, 44. Bank for International Settlements.
  12. Carney, M. (2013). Crossing the threshold to recovery. Bank of England Speech, 28.
  13. Chen, W., Lai, K., and Cai, Y. (2018). Topic generation for Chinese stocks: A cognitively motivated topic modeling method using social media data. Quantitative Finance and Economics, 2(2), 279-293. https://doi.org/10.3934/QFE.2018.2.279
  14. Cho, K. W., Bae, S. K., and Woo, Y. W. (2017). Analysis on topic trends and topic modeling of KSHSM journal papers using text mining. The Korean Journal of Health Service Management, 11(4), 213-224. https://doi.org/10.12811/kshsm.2017.11.4.213
  15. Cho, S. B., Shin, S. A., and Kang, D. S. (2018). A study on the research trends on open innovation using topic modeling. Informatization Policy, 25(3), 52-74. https://doi.org/10.22693/NIAIP.2018.25.3.052
  16. Choi, J. W., Han, H. S., Lee, M., and An, J. M. (2015). The prediction of corporate bankruptcy using text-mining methodology. Korea Productivity Association, 29(1), 201-228. https://doi.org/10.15843/kpapr.29.1.201503.201
  17. Connelly, R., Playford, C. J., Gayle, V., and Dibben, C. (2016). The role of administrative data in the big data revolution in social science research. Social Science Research, 59, 1-12.
  18. Davis, A. K., Piger, J. M., and Sedor, L. M. (2012). Beyond the numbers: Measuring the information content of earnings press release language. Contemporary Accounting Research, 29(3), 845-868. https://doi.org/10.1111/j.1911-3846.2011.01130.x
  19. de Oliveira, P. C. F., Ahmad, K., and Gillam, L. (2002). A financial news summarization system based on lexical cohesion. Proceedings of the International Conference on Terminology and Knowledge Engineering. Nancy, France.
  20. Domingos, P., and Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine learning, 29(2-3), 103-130. https://doi.org/10.1023/A:1007413511361
  21. Dong, L., Wei, F., Zhou, M., and Xu, K. (2015). Question answering over freebase with multi-column convolutional neural networks. ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, 1, 260-269.
  22. FiveThirtyEight. (2016). Who will win the presidency?, Retrieved from https://projects.fivethirtyeight.com/2016-election-forecast/
  23. Forss, T., and Sarlin, P. (2016). From news to company networks: Co-occurrence, sentiment, and information centrality. 2016 IEEE Symposium Series on Computational Intelligence (SSCI).
  24. Freeman, L. C. (1978). Centrality in social networks conceptual clarification. Social Networks, 1(3), 215-239. https://doi.org/10.1016/0378-8733(78)90021-7
  25. Fu, X., Ouyang, T., Chen, J., and Luo, X. (2020). Listening to the investors: A novel framework for online lending default prediction using deep learning neural networks. Information Processing & Management, 57(4), 102236. https://doi.org/10.1016/j.ipm.2020.102236
  26. Guo, H., Wang, L., Chen, F., and Liang, D. (2014). Scientific big data and digital earth. Chinese Science Bulletin, 59(35), 5066-5073.
  27. Hansen, S., McMahon, M., and Prat, A. (2018). Transparency and deliberation within the FOMC: a computational linguistics approach. The Quarterly Journal of Economics, 133(2), 801-870.
  28. Harris, Z. S. (1954). Distributional structure. Word, 10(2-3), 146-162. https://doi.org/10.1080/00437956.1954.11659520
  29. Hearst, M. A. (1999). Untangling text data mining. Proceedings of the 37th Annual meeting of the Association for Computational Linguistics.
  30. Huang, H. C., Hwang, S. Y., Chang, S., and Kang, Y. (2017). Forecasting company revenue trend using financial news. Pacific Asia Conference on Information Systems (PACIS).
  31. Jang, J. K., Lee, K. H., and Lee, Z. (2016). How the title of investment strategy report affects stock price forecast: Using text mining method. Korea Bigdata Society, 1(2), 21-34.
  32. Jones, K. S. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1), 11-21. https://doi.org/10.1108/eb026526
  33. Junque de Fortuny, E., De Smedt, T., Martens, D., and Daelemans, W. (2014). Evaluating and understanding text-based stock price prediction models. Information Processing & Management, 50(2), 426-441. https://doi.org/10.1016/j.ipm.2013.12.002
  34. Keith, K. A., and Stent, A. (2019). Modeling financial analysts' decision making via the pragmatics and semantics of earnings calls. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 493-503.
  35. Kim, Y. (2014). Convolutional neural networks for sentence classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) Doha, Qatar.
  36. Kulathunga, C., and Karunaratne, D. D. (2017). An ontology-based and domain specific clustering methodology for financial documents. 2017 Seventeenth International Conference on Advances in ICT for Emerging Regions (ICTer).
  37. Kumar, B. S., and Ravi, V. (2016). A survey of the applications of text mining in financial domain. Knowledge-Based Systems, 114, 128-147. https://doi.org/10.1016/j.knosys.2016.10.003
  38. Kwahk, K. Y. (2014). Social Network Analysis. CHUNGRAM.
  39. Lee, Y., Kim, S., and Park, K. (2019). Deciphering monetary policy committee minutes with text mining approach: A case of Korea. Korean Economic Review, 35(2), 471-511. https://doi.org/10.22841/kerdoi.2019.35.2.008
  40. Li, Q., and Shah, S. (2017). Learning stock market sentiment lexicon and sentiment-oriented word vector from stocktwits. Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017).
  41. Li, W., Azar, P., Larochelle, D., Hill, P., and Lo, A. W. (2015). Law is code: a software engineering approach to analyzing the united states code. J. Bus. & Tech. L., 10, 297.
  42. Loughran, T., and McDonald, B. (2011). When is a liability not a liability? Textual analysis, dictionaries, and 10 Ks. The Journal of Finance, 66(1), 35-65. https://doi.org/10.1111/j.1540-6261.2010.01625.x
  43. Mao, H., Jin, X., and Zhu, L. (2015). Methods of measuring influence of bank customer using social network model. American Journal of Industrial and Business Management, 5(4), 155.
  44. Mcauliffe, J., and Blei, D. (2007). Supervised topic models. Advances in Neural Information Processing Systems, 20, 121-128.
  45. Mihalyi, D., and Mate, A. (2019). Text-Mining IMF Country Reports-An Original Dataset. MPRA Paper 100656. University Library of Munich, Germany.
  46. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 2, 3111-3119. https://doi.org/10.5555/2999792.2999959
  47. Miner, G., Elder IV, J., Fast, A., Hill, T., Nisbet, R., and Delen, D. (2012). Practical text mining and statistical analysis for non-structured text data applications. Academic Press.
  48. MOE. (2018). A study on the ways to relize free high school education.
  49. Moro, S., Cortez, P., and Rita, P. (2015). Business intelligence in banking: A literature analysis from 2002 to 2013 using text mining and latent Dirichlet allocation. Expert Systems with Applications, 42(3), 1314-1324. https://doi.org/10.1016/j.eswa.2014.09.024
  50. NABO, K. (2019). Public Finance of Korea 2019. National Assembly Budget Office.
  51. Nassirtoussi, A. K., Aghabozorgi, S., Wah, T. Y., and Ngo, D. C. L. (2014). Text mining for market prediction: A systematic review. Expert Systems with Applications, 41(16), 7653-7670. https://doi.org/10.1016/j.eswa.2014.06.009
  52. Nguyen, T. H., and Shirai, K. (2015). Topic modeling based sentiment analysis on social media for stock market prediction. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 1354-1364.
  53. Nieminen, J. (1974). On the centrality in a graph. Scandinavian Journal of Psychology, 15(1), 332-336. https://doi.org/10.1111/j.1467-9450.1974.tb00598.x
  54. Nyman, R., Kapadia, S., and Tuckett, D. (2021). News and narratives in financial systems: exploiting big data for systemic risk assessment. Journal of Economic Dynamics and Control, 127, 104119.
  55. OPM, C. (2021). Automated budget system, Retrieved from https://bit.ly/3wiLR4r
  56. Oracle, C. (2022). Exadata cloud increases financial services insight and agility, Retrieved from https://www.oracle.com/database/what-is-data-management/financial-services/
  57. Oshima, Y., and Matsubayashi, Y. (2018). Monetary policy communication of the bank of Japan: Computational text analysis. Discussion Papers 1816, Graduate School of Economics, Kobe University.
  58. Otter, D. W., Medina, J. R., and Kalita, J. K. (2020). A survey of the usages of deep learning for natural language processing. IEEE Transactions on Neural Networks and Learning Systems, 1-21.
  59. Pang, B., and Lee, L. (2009). Opinion mining and sentiment analysis. Comput. Linguist, 35(2), 311-312. https://doi.org/10.1162/coli.2009.35.2.311
  60. Park, S. M., Na, C. W., Choi, M. S., Lee, D. H., and On, B. W. (2018). KNU Korean sentiment lexicon: Bi-LSTM-based method for building a Korean sentiment lexicon. Journal of Intelligence and Information Systems, 24(4), 219-240. https://doi.org/10.13088/JIIS.2018.24.4.219
  61. Pejic-Bach, M., Pivar, J., and Krstic, Z. (2019). Big data for prediction: Patent analysis-Patenting big data for prediction analysis. In Big Data Governance and Perspectives in Knowledge Management (pp. 218-240). IGI Global.
  62. Pejic Bach, M., Krstic, Z., Seljan, S., and Turulja, L. (2019). Text mining for big data analysis in financial sector: A literature review. Sustainability, 11(5), 1277.
  63. Qian, Y., Deng, X., Ye, Q., Ma, B., and Yuan, H. (2019). On detecting business event from the headlines and leads of massive online news articles. Information Processing & Management, 56(6), 102086. https://doi.org/10.1016/j.ipm.2019.102086
  64. Ramage, D., Hall, D., Nallapati, R., and Manning, C. D. (2009). Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. Proceedings of the 2009 conference on empirical methods in natural language processing, 248-256.
  65. Rekabsaz, N., Lupu, M., Baklanov, A., Hanbury, A., Dur, A., and Anderson, L. (2017). Volatility prediction using financial disclosures sentiments with word embedding-based IR models. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1712-1721.
  66. Rennie, J. D., Shih, L., Teevan, J., and Karger, D. R. (2003). Tackling the poor assumptions of naive bayes text classifiers. Proceedings of the 20th international conference on machine learning (ICML-03), 616-623.
  67. Ronnqvist, S., and Sarlin, P. (2015). Bank networks from text: interrelations, centrality and determinants. Quantitative Finance, 15(10), 1619-1635. https://doi.org/10.1080/14697688.2015.1071076
  68. Rugters, U. (2015). Big data in accounting : An overview. Rutgers Business School, Retrieved from https://bit.ly/3hDLxtd
  69. Rush, A. M., Chopra, S., and Weston, J. (2015). A neural attention model for abstractive sentence summarization. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 379-389.
  70. Sabidussi, G. (1966). The centrality index of a graph. Psychometrika, 31(4), 581-603. https://doi.org/10.1007/BF02289527
  71. Salton, G., Wong, A., and Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613-620. https://doi.org/10.1145/361219.361220
  72. Schonhardt-Bailey, C. (2013). Deliberating American policy: A textual analysis. MIT Press, Cambridge, MA.
  73. Schumaker, R. P., Zhang, Y., Huang, C. N., and Chen, H. (2012). Evaluating sentiment in financial news articles. Decision Support Systems, 53(3), 458-464. https://doi.org/10.1016/j.dss.2012.03.001
  74. Shirata, C. Y., Takeuchi, H., Ogino, S., and Watanabe, H. (2011). Extracting key phrases as predictors of corporate bankruptcy: Empirical analysis of annual reports by text mining. Journal of Emerging Technologies in Accounting, 8(1), 31-44.
  75. Si, J., Mukherjee, A., Liu, B., Li, Q., Li, H., and Deng, X. (2013). Exploiting topic based twitter sentiment for stock prediction. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 24-29.
  76. Silver, N. (2016). Donald Trump Has A 20 Percent Chance Of Becoming President, Retrieved from https://fivethirtyeight.com/features/donald-trump-has-a-20-percent-chance-of-becoming-president/
  77. Song, M. (2017). Text Mining. CHUNGRAM.
  78. Song, T. (2016). Using social big data predictive future signal: With special reference to the major policy issues of health and welfare. Health and Welfare Policy Forum, 2016(8), 17-30. https://doi.org/10.23062/2016.08.3
  79. Stephens-Davidowitz, S. (2014). The cost of racial animus on a black candidate: Evidence using Google search data. Journal of Public Economics, 118, 26-40. https://doi.org/10.1016/j.jpubeco.2014.04.010
  80. Stephens-Davidowitz, S. (2017). Everybody lies. Harper Collins.
  81. Tetlock, P. C. (2007). Giving content to investor sentiment: The role of media in the stock market. The Journal of Finance, 62(3), 1139-1168. https://doi.org/10.1111/j.1540-6261.2007.01232.x
  82. Turner, D., Schroeck, M., and Shockley, R. (2013). Analytics: The real-world use of big data in financial services. IBM Global Business Services.
  83. USDT. (2015). Annual Privacy and Data Mining Report. U.S. Department of the Treasury, Retrieved from https://home.treasury.gov/footer/privacy-act/privacy-reports
  84. Utami, E., and Luthfi, E. T. (2018). Text mining based on tax comments as big data analysis using SVM and feature selection. 2018 International Conference on Information and Communications Technology (ICOIACT), 537-542.
  85. Valles, D., and Schonhardt-Bailey, C. (2015). Forward Guidance as Central Bank Discourse: MPC Minutes and Speeches under King and Carney Political Leadership and Economic Crisis Symposium, Yale University.
  86. Wang, C. J., Tsai, M. F., Liu, T., and Chang, C. T. (2013). Financial sentiment analysis for risk prediction. Proceedings of the Sixth International Joint Conference on Natural Language Processing, 802-808.
  87. Wang, W., Yang, N., Wei, F., Chang, B., and Zhou, M. (2017). Gated self-matching networks for reading comprehension and question answering. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 189-198.
  88. Xu, F., Pan, Z., and Xia, R. (2020). E-commerce product review sentiment classification based on a naive Bayes continuous learning framework. Information Processing & Management, 57(5), 102221. https://doi.org/10.1016/j.ipm.2020.102221
  89. Yang, C. C., and Wang, F. L. (2003). Automatic summarization for financial news delivery on mobile devices, Retrieved from http://www2003.org/cdrom/papers/poster/p178/p178-yang.html
  90. Yang, W., Xie, Y., Lin, A., Li, X., Tan, L., Xiong, K., Li, M., and Lin, J. (2019). End-to-end open-domain question answering with BERTserini. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), 72-77.
  91. Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., and Hovy, E. (2016). Hierarchical attention networks for document classification. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1480-1489.
  92. Zafarani, R., Abbasi, M. A., and Liu, H. (2014). Social media mining: an introduction. Cambridge University Press.
  93. Zhang, H. (2005). Exploring conditions for the optimality of naive Bayes. International Journal of Pattern Recognition and Artificial Intelligence, 19(02), 183-198. https://doi.org/10.1142/S0218001405003983
  94. Zhang, H., Cai, J., Xu, J., and Wang, J. (2019). Pretraining-Based Natural Language Generation for Text Summarization. Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), 789-797.