Browse > Article
http://dx.doi.org/10.13088/jiis.2022.28.2.237

The prediction of the stock price movement after IPO using machine learning and text analysis based on TF-IDF  

Yang, Suyeon (School of Management Engineering, College of Business, KAIST)
Lee, Chaerok (School of Business, Pusan National University)
Won, Jonggwan (School of Business, Pusan National University)
Hong, Taeho (School of Business, Pusan National University)
Publication Information
Journal of Intelligence and Information Systems / v.28, no.2, 2022 , pp. 237-262 More about this Journal
Abstract
There has been a growing interest in IPOs (Initial Public Offerings) due to the profitable returns that IPO stocks can offer to investors. However, IPOs can be speculative investments that may involve substantial risk as well because shares tend to be volatile, and the supply of IPO shares is often highly limited. Therefore, it is crucially important that IPO investors are well informed of the issuing firms and the market before deciding whether to invest or not. Unlike institutional investors, individual investors are at a disadvantage since there are few opportunities for individuals to obtain information on the IPOs. In this regard, the purpose of this study is to provide individual investors with the information they may consider when making an IPO investment decision. This study presents a model that uses machine learning and text analysis to predict whether an IPO stock price would move up or down after the first 5 trading days. Our sample includes 691 Korean IPOs from June 2009 to December 2020. The input variables for the prediction are three tone variables created from IPO prospectuses and quantitative variables that are either firm-specific, issue-specific, or market-specific. The three prospectus tone variables indicate the percentage of positive, neutral, and negative sentences in a prospectus, respectively. We considered only the sentences in the Risk Factors section of a prospectus for the tone analysis in this study. All sentences were classified into 'positive', 'neutral', and 'negative' via text analysis using TF-IDF (Term Frequency - Inverse Document Frequency). Measuring the tone of each sentence was conducted by machine learning instead of a lexicon-based approach due to the lack of sentiment dictionaries suitable for Korean text analysis in the context of finance. For this reason, the training set was created by randomly selecting 10% of the sentences from each prospectus, and the sentence classification task on the training set was performed after reading each sentence in person. Then, based on the training set, a Support Vector Machine model was utilized to predict the tone of sentences in the test set. Finally, the machine learning model calculated the percentages of positive, neutral, and negative sentences in each prospectus. To predict the price movement of an IPO stock, four different machine learning techniques were applied: Logistic Regression, Random Forest, Support Vector Machine, and Artificial Neural Network. According to the results, models that use quantitative variables using technical analysis and prospectus tone variables together show higher accuracy than models that use only quantitative variables. More specifically, the prediction accuracy was improved by 1.45% points in the Random Forest model, 4.34% points in the Artificial Neural Network model, and 5.07% points in the Support Vector Machine model. After testing the performance of these machine learning techniques, the Artificial Neural Network model using both quantitative variables and prospectus tone variables was the model with the highest prediction accuracy rate, which was 61.59%. The results indicate that the tone of a prospectus is a significant factor in predicting the price movement of an IPO stock. In addition, the McNemar test was used to verify the statistically significant difference between the models. The model using only quantitative variables and the model using both the quantitative variables and the prospectus tone variables were compared, and it was confirmed that the predictive performance improved significantly at a 1% significance level.
Keywords
Stock Price Prediction; IPO; TF-IDF; Machine Learning; Tone Analysis;
Citations & Related Records
Times Cited By KSCI : 4  (Citation Analysis)
연도 인용수 순위
1 Kim, K. Y., G. R. Lee, and S.W. Lee, "A Comparative Analysis of Artificial Intelligence System and Ohlson model for IPO firm's Stock Price Evaluation," Journal of Digital Convergence, Vol.11, No.5(2013), 145~158.   DOI
2 Kim, S. J. and H. C. Ahn, "Application of Random Forests to Corporate Credit Rating Prediction," The Journal of Business and Economics, Vol.32, No.1(2016), 187~211
3 Katsafados, A. G., I. Androutsopoulos., I. Chalkidis., E. Fergadiotis., G. N. Leledakis., and E. G. Pyrgiotakis, "Textual information and IPO underpricing: A machine learning approach," MPRA Paper 103813, University Library of Munich, Germany, (2020).
4 Kim, H. H. "Suggestions for Disclosure of Risk Factors in Investment," Practical Explanations of Securities, Korean Listed Companies Association, (2008), 105~108.
5 Kim, H. J., J. W. Park, and J. W. Lee, "A Study on the Textual Analysis Research Environment using the DART System in Korea," Korean Accounting Journal, Vol.24, No.4(2015), 199~221.
6 Kim, I. H., "Ways to Go Public: Choice of IPO, Sellout, and Reverse Takeover," The Korean Finance Association, (2008), 958~1008.
7 Kim, K. J. and H. C. Ahn, "Optimization of Support Vector Machines for Financial Forecasting," Journal of Intelligence and Information Systems, Vol.17, No.4(2011), 241~254.   DOI
8 Hong. T. H. and E. M. Kim, "The Prediction of Purchase Amount of Customers Using Support Vector Regression with Separated Learning Method," Journal of Intelligence and Information Systems, Vol.16, No.4(2010), 213~225.
9 Kim, T. H., "A Study on the present situations and some proposal for improvements of IPO Regulations and Systems," Journal of Business Administration & Law, Vol.26, No.4(2016), 201~238.
10 Reber, B., B. Berry., and S. Toms, "Predicting mispricing of initial public offerings," Intelligent Systems in Accounting, Finance, and Management, Vol.13, No.1(2005), 41~59.   DOI
11 Kim, Y. S. and S. W. Joh, "Text Analysis for IPO firms in Korea: Analysis of Korean Texts in Registration Statements via Machine Learning," Korean Journal of Financial Studies, Vol.48, No.2(2019), 215~235.   DOI
12 Kolchyna, O., T. T. Souza., P. Treleaven., and T. Aste, "Twitter sentiment analysis: Lexicon method, machine learning method and their combination," arXiv preprint arXiv:1507.009 55, (2015).
13 Hanley, K. W. and G. Hoberg, "The Information Content of IPO Prospectuses," Review of Financial Studies, Vol.23, No.7(2010), 2821~2864.
14 Baba, B. and G. Sevil, "Predicting IPO initial returns using random forest," Borsa Istanbul Review, Vol.20, No.1(2020), 13~23.   DOI
15 Basti, E., C. Kuzey, and D. Delen, "Analyzing initial public offerings' short-term performance using decision trees and SVMs," Decision Support Systems, Vol.73, (2015), 15~27.   DOI
16 Brown, I. and C. Mues, "An experimental comparison of classification algorithms for imbalanced credit scoring data sets," Expert Systems with Applications, Vol.39, No.3(2012), 3446~3453.   DOI
17 Chen, Y. S. and C. H. Cheng, "A soft-computing based rough sets classifier for classifying IPO returns in the financial markets," Applied Soft Computing, Vol.12, No.1(2012), 462~475.   DOI
18 Cortes, C. and V. Vapnik, "Support-vector networks," Machine learning, Vol.20, No.3(1995), 273~297.   DOI
19 Esfahanipour, A., M. Goodarzi., and R. Jahanbin, "Analysis and forecasting of IPO underpricing," Neural Computing and Applications, Vol.27, No.3(2015), 651~658.   DOI
20 Gandoman, S. H., N. Kiamehr., and M. Hemetfar, "Forecasting Initial Public Offering Pricing Using Particle Swarm Optimization (PSO) Algorithm and Support Vector Machine (SVM) In Iran," Business and Economic Research, Vol.7, No.1(2017), 336.   DOI
21 Jegadeesh, N. and D. Wu, "Word power: A new approach for content analysis," Journal of Financial Economics, Vol.110, No.3(2013), 712~729.   DOI
22 Kaohua, Y. and Liwen, Z, "Analysis of influencing factors of IPO underpricing based on rough set and support vector machine," 2012 International Conference on Information Management, Innovation Management and Industrial Engineering, Vol. 3, (2012), 244~248.
23 Li, F, "The Information Content of Forward-Looking Statements in Corporate Filings-A Naive Bayesian Machine Learning Approach," Journal of Accounting Research, Vol.48, No. 5(2010), 1049~1102.   DOI
24 Rock, K, "Why new issues are underpriced," Journal of Financial Economics, Vol.15, No.1-2 (1986), 187~212.   DOI
25 Baek, J. S. and M. S. Jeong, "A study on the effect of competition rate of subscription and guarantee rate on the underpricing in the initial public offerings," The Korean Finance Association, (2018), 1943~1980.
26 Shin, S. H., H. J. Lee, and J. J. Ahn, "A study on initial price change prediction of IPO shares using non-financial information," Journal of the Korean Data and Information Science Society, Vol.29, No.2(2018), 589~616.
27 Tao, J., A. V. Deokar., and A. Deshmukh, "Analysing forward-looking statements in initial public offering prospectuses: a text analytics approach," Journal of Business Analytics, Vol.1, No.1(2018), 54~70.   DOI
28 Tetlock, P. C., M. Saar-Tsechansky., and S. Macskassy, "More than words: Quantifying language to measure firms' fundamentals," The Journal of Finance, Vol.63, No.3(2008), 1437~1467.   DOI
29 Kim, D. Y. and D. E. Won, "An AI Model for Short-term KOSPI Prediction: A Machine Learning-Based Model Using Random Forest Technique," Quantitative Issue, Samsung Securities Research Center, (2019).
30 Ahn, H. C., "Optimization of Multiclass Support Vector Machine using Genetic Algorithm: Application to the Prediction of Corporate Credit Rating," Information Systems Review, Vol.16, No.3(2014), 161~177.   DOI
31 Benveniste, L. M., and P. A. Spindt, "How investment bankers determine the offer price and allocation of new issues," Journal of Financial Economics, Vol.24, No.2(1989), 343~361.   DOI
32 Breiman, L, "Random forests," Machine Learning, Vol.45, No. 1(2001), 5~32.   DOI
33 Luque, C., D. Quintana., and P. Isasi, "Predicting IPO Underpricing with Genetic Algorithms," International Journal of Artificial Intelligence, Vol.8, No.12(2012), 133~146.
34 Martens, D. and F. Provost, "Explaining Data-Driven Document Classifications," MIS Quarterly, Vol.38, No.1(2014), 73-100.   DOI
35 Loughran, T. and B. McDonald, "When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10-Ks," The Journal of Finance, Vol.66, No.1(2011), 35~65.   DOI
36 Loughran, T. and B. McDonald, "IPO first-day returns, offer price revisions, volatility, and form S-1 language," Journal of Financial Economics, Vol.109, No.2(2013), 307~326.   DOI
37 Ly, T. H. and K. Nguyen, "Do Words Matter: Predicting IPO Performance from Prospectus Sentiment," 2020 IEEE 14th International Conference on Semantic Computing, (2020), 307~310.
38 Ibbotson, R. G, "Price performance of common stock new issues," Journal of Financial Economics, Vol.2, No.3(1975), 235~272.   DOI
39 Mai, F., S. Tian., C. Lee., and L. Ma, "Deep learning models for bankruptcy prediction using textual disclosures," European Journal of Operational Research, Vol.274, No.2(2019), 743~758.   DOI
40 Fang, F., K. Dutta, and A. Datta, "Domain Adaptation for Sentiment Classification in Light of Multiple Sources," INFORMS Journal on Computing, Vol.26, No.3(2014), 586-598.   DOI
41 Manurung, J., H. Mawengkang., and E. Zamzami, "Optimizing Support Vector Machine Parameters with Genetic Algorithm for Credit Risk Assessment," Journal of Physics: Conference Series, Vol.930, No. 1(2017).
42 Miller, E. M, "Risk, uncertainty, and divergence of opinion," Journal of Finance, Vol.32, No.4 (1977), 1151~1168.   DOI
43 Prastyo, P.H., I. Ardiyanto., and R. Hidayat, "Indonesian Sentiment Analysis: An Experimental Study of Four Kernel Functions on SVM Algorithm with TF-IDF," 2020 International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy (ICDABI), (2020), 1~6.
44 Muditomo, A. and A. S. Broto, "IPO Performance Prediction During Covid-19 Pandemic in Indonesia Using Decision Tree Algorithm," Journal of Finance and Banking, Vol.25, No.1(2021), 132~143.
45 Nguyen, H., A. Veluchamy., M. Diop., and R. Iqbal, "Comparative Study of Sentiment Analysis with Product Reviews Using Machine Learning and Lexicon-Based Approaches," SMU Data Science Review, Vol.1, No.4(2018), 7.
46 Min, J. H, "IPO Stock Performance of Institutional and Individual Investors," Journal of CEO and Management Studies, Vol.20, No.3(2017), 75~98.
47 Buehlmaier, M. M. M. and T. M. Whited, (2018) "Are Financial Constraints Priced? Evidence from Textual Analysis," The Review of Financial Studies, Vol.31, No.7(2018), 2693~2728.   DOI
48 Moraes, R., J. F. Valiati, and W. P. G. Neto, "Document-level sentiment classification: An empirical comparison between SVM and ANN," Expert Systems with Applications, Vol.40, No.2(2013), 621-633.   DOI
49 Park J. S. and B. H. Han, "Analyzing the IPO Market of 2020 and Forecasting the Market of 2021," KOSDAQ Venture, Eugene Research Center, (2021), 16-150.
50 Park, S. G., H. J. Lee, H. J. Sim, J. Y. Lee, and J. E. Oh, "Construction of Sound Quality Index for the Vehicle HVAC System Using Regression Model and Neural Network Model," Korean Society for Noise and Vibration Engineering, (2006), 1308~1313.
51 Quintana, D., F. Chavez., R. M. Luque Baena., and F. Luna, "Fuzzy techniques for IPO underpricing prediction," Journal of Intelligent & Fuzzy Systems, Vol.35, No.1(2018), 367~381.   DOI
52 Ruud, J. S, "Underwriter price support and the IPO underpricing puzzle," Journal of Financial Economics, Vol.34, No.2(1993), 135~151.   DOI
53 Seo, S. H. and J. T. Kim "Research Trends in Deep Learning-Based Sentiment Analysis," Journal of Korea Multimedia Society, Vol.20, No.3(2016), 8~22.
54 Dadgar, S.M., M.S. Araghi., and M.M. Farahani, "A novel text mining approach based on TF-IDF and Support Vector Machine for news classification" 2016 IEEE International Conference on Engineering and Technology (ICETECH), (2016), 112~116.
55 Chan, Y, "Retail Trading and IPO Returns in the Aftermarket," Financial Management, Vol.39, No.4(2010), 1475~1495.   DOI
56 Cho, D. H., H. S. Ryou, S. H. Jung, and K. J. Oh, "Using AI to develop forecasting model in IPO market," Journal of the Korean Data and Information Science Society, Vol.31, No.3 (2020), 579~590.   DOI
57 Chun, K. M., I. H. Gee, and H. U. Lee, "The Effect of IPO Subscription Rates for Institutional Investors and Private Investors on IPO Firm Performance: The Moderating Role of Competition and On-line Reviews," Korean Journal of Business Administration, Vol.26, No.5(2013), 1149~1176.
58 Derrien, F, "IPO Pricing in "Hot" Market Conditions: Who Leaves Money on the Table?," The Journal of Finance, Vol.60, No.1(2005), 487~521.   DOI
59 Fuksa, M, "Sentiment and the Performance of Initial Public Offerings," Available at SSRN 2243379, (2013).
60 Garcia, D. "Sentiment during Recessions," The Journal of Finance, Vol.68, No. 3(2013), 1267~1300.   DOI
61 Han, G. S. "A Study on the Underpricing of IPOs in Korea Capital Market," Korean International Accounting Review, Vol.59, (2015), 125~146.
62 Hong, S. H. and K. S. Shin, "Using GA based Input Selection Method for Artificial Neural Network Modeling: Application to Bankruptcy Prediction," Journal of Intelligence and Information Systems, Vol.9, No.1(2003), 227~249.
63 Kim, J., S.M. Jun., S. Hwang., H.K. Kim., J. Heo., and M.S. Kang, "Impact of Activation Functions on Flood Forecasting Model Based on Artificial Neural Networks," Journal of The Korean Society of Agricultural Engineers, Vol.63, No.1(2021), 11~25.   DOI
64 Kim, H. A. and S. C. Jung, (2010) "The Effect of Optimistic Investors' Sentiment on Anomalous Behaviors in the Hot Market IPOs," The Korean Journal of Financial Management, Vol.27, No.2(2010), 1~33.
65 Park K. J. and J. Q. Jeon, "The Effect of IPO Syndicates on Underwriting Services: Focusing on Multiple Lead Underwriters and Co-Managers," Korean Journal of Financial Studies, Vol.44, No.1(2015), 189~219.
66 Ahn, C. K. and D. Kim, "Efficient variable selection method using conditional mutual information," Journal of the Korean Data and Information Science Society, Vol.25, No.5(2014), 1079~1094.   DOI
67 Lee, H. S., S. H. Jeong, and K. J. Oh, "A study on the prediction of Korean NPL market return," Journal of Intelligence and Information Systems, Vol.25, No.2(2019), 123~139.   DOI
68 Lee, S. W. and J. H. Kim, "A Study on the Extraction of Psychological Distance Embedded in Company's SNS Messages Using Machine Learning," Information Systems Review, Vol.21, No.1(2019), 23~38.   DOI
69 Park, J. W., G. C. Jung, and J. E. Cho, "Institutional Investor Trading and IPOs Performance," Korean Journal of Financial Studies, Vol.45, No.1(2016), 171~192.
70 Perera, W. and N. Kulendran, "Short-run underpricing and its determinants: Evidence from Australian IPOS," Corporate Ownership and Control, Vol.13, No.3(2016), 502~517.   DOI
71 Jung, J. Y. and K. W. Park, "A Study on Investor Protection through Morphological Analysis of Equity Crowdfunding Investment Manual," Journal of Information Technology Services, Vol.18, No.5(2019), 165~182.   DOI
72 Imamah and F. H. Rachman, "Twitter Sentiment Analysis of Covid-19 Using Term Weighting TF-IDF And Logistic Regression," 2020 6th Information Technology International Seminar (ITIS), Surabaya, Indonesia, (2020), 238~242.
73 Islam, M., F.E. Jubayer, and S.I. Ahmed, "A support vector machine mixed with TF-IDF algorithm to categorize Bengali document," 2017 International Conference on Electrical, Computer and Communication Engineering, (2017), 191~196.
74 Joh, S. W. and Y. Kim, "Is Textual Information Informative to Informed Investors? Evidence from Bidding Information of Institutional Investors in IPOs," The Thirteenth Conference on Asia-Pacific Financial Markets, Seoul, Korea, (2018).
75 Yan, Y., X. Xiong., J. G. Meng., & G. Zou, "Uncertainty and IPO initial returns: Evidence from the Tone Analysis of China's IPO Prospectuses," Pacific-Basin Finance Journal, Vol.57, (2019), 101075.   DOI
76 Seo, K. K., "Sales Prediction of Electronic Appliances using a Convergence Model based on Artificial Neural Network and Genetic Algorithm," Journal of Digital Convergence, Vol.13, No.9(2015), 177~182.   DOI
77 Sun, A., E. P. Lim, and Y. Liu, "On strategies for imbalanced text classification using SVM: A comparative study," Decision Support Systems, Vol.48, No.1(2009), 191-201.   DOI