• Title/Summary/Keyword: stock index prediction

Search Result 96, Processing Time 0.02 seconds

Optimization of Support Vector Machines for Financial Forecasting (재무예측을 위한 Support Vector Machine의 최적화)

  • Kim, Kyoung-Jae;Ahn, Hyun-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.241-254
    • /
    • 2011
  • Financial time-series forecasting is one of the most important issues because it is essential for the risk management of financial institutions. Therefore, researchers have tried to forecast financial time-series using various data mining techniques such as regression, artificial neural networks, decision trees, k-nearest neighbor etc. Recently, support vector machines (SVMs) are popularly applied to this research area because they have advantages that they don't require huge training data and have low possibility of overfitting. However, a user must determine several design factors by heuristics in order to use SVM. For example, the selection of appropriate kernel function and its parameters and proper feature subset selection are major design factors of SVM. Other than these factors, the proper selection of instance subset may also improve the forecasting performance of SVM by eliminating irrelevant and distorting training instances. Nonetheless, there have been few studies that have applied instance selection to SVM, especially in the domain of stock market prediction. Instance selection tries to choose proper instance subsets from original training data. It may be considered as a method of knowledge refinement and it maintains the instance-base. This study proposes the novel instance selection algorithm for SVMs. The proposed technique in this study uses genetic algorithm (GA) to optimize instance selection process with parameter optimization simultaneously. We call the model as ISVM (SVM with Instance selection) in this study. Experiments on stock market data are implemented using ISVM. In this study, the GA searches for optimal or near-optimal values of kernel parameters and relevant instances for SVMs. This study needs two sets of parameters in chromosomes in GA setting : The codes for kernel parameters and for instance selection. For the controlling parameters of the GA search, the population size is set at 50 organisms and the value of the crossover rate is set at 0.7 while the mutation rate is 0.1. As the stopping condition, 50 generations are permitted. The application data used in this study consists of technical indicators and the direction of change in the daily Korea stock price index (KOSPI). The total number of samples is 2218 trading days. We separate the whole data into three subsets as training, test, hold-out data set. The number of data in each subset is 1056, 581, 581 respectively. This study compares ISVM to several comparative models including logistic regression (logit), backpropagation neural networks (ANN), nearest neighbor (1-NN), conventional SVM (SVM) and SVM with the optimized parameters (PSVM). In especial, PSVM uses optimized kernel parameters by the genetic algorithm. The experimental results show that ISVM outperforms 1-NN by 15.32%, ANN by 6.89%, Logit and SVM by 5.34%, and PSVM by 4.82% for the holdout data. For ISVM, only 556 data from 1056 original training data are used to produce the result. In addition, the two-sample test for proportions is used to examine whether ISVM significantly outperforms other comparative models. The results indicate that ISVM outperforms ANN and 1-NN at the 1% statistical significance level. In addition, ISVM performs better than Logit, SVM and PSVM at the 5% statistical significance level.

Machine learning-based corporate default risk prediction model verification and policy recommendation: Focusing on improvement through stacking ensemble model (머신러닝 기반 기업부도위험 예측모델 검증 및 정책적 제언: 스태킹 앙상블 모델을 통한 개선을 중심으로)

  • Eom, Haneul;Kim, Jaeseong;Choi, Sangok
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.105-129
    • /
    • 2020
  • This study uses corporate data from 2012 to 2018 when K-IFRS was applied in earnest to predict default risks. The data used in the analysis totaled 10,545 rows, consisting of 160 columns including 38 in the statement of financial position, 26 in the statement of comprehensive income, 11 in the statement of cash flows, and 76 in the index of financial ratios. Unlike most previous prior studies used the default event as the basis for learning about default risk, this study calculated default risk using the market capitalization and stock price volatility of each company based on the Merton model. Through this, it was able to solve the problem of data imbalance due to the scarcity of default events, which had been pointed out as the limitation of the existing methodology, and the problem of reflecting the difference in default risk that exists within ordinary companies. Because learning was conducted only by using corporate information available to unlisted companies, default risks of unlisted companies without stock price information can be appropriately derived. Through this, it can provide stable default risk assessment services to unlisted companies that are difficult to determine proper default risk with traditional credit rating models such as small and medium-sized companies and startups. Although there has been an active study of predicting corporate default risks using machine learning recently, model bias issues exist because most studies are making predictions based on a single model. Stable and reliable valuation methodology is required for the calculation of default risk, given that the entity's default risk information is very widely utilized in the market and the sensitivity to the difference in default risk is high. Also, Strict standards are also required for methods of calculation. The credit rating method stipulated by the Financial Services Commission in the Financial Investment Regulations calls for the preparation of evaluation methods, including verification of the adequacy of evaluation methods, in consideration of past statistical data and experiences on credit ratings and changes in future market conditions. This study allowed the reduction of individual models' bias by utilizing stacking ensemble techniques that synthesize various machine learning models. This allows us to capture complex nonlinear relationships between default risk and various corporate information and maximize the advantages of machine learning-based default risk prediction models that take less time to calculate. To calculate forecasts by sub model to be used as input data for the Stacking Ensemble model, training data were divided into seven pieces, and sub-models were trained in a divided set to produce forecasts. To compare the predictive power of the Stacking Ensemble model, Random Forest, MLP, and CNN models were trained with full training data, then the predictive power of each model was verified on the test set. The analysis showed that the Stacking Ensemble model exceeded the predictive power of the Random Forest model, which had the best performance on a single model. Next, to check for statistically significant differences between the Stacking Ensemble model and the forecasts for each individual model, the Pair between the Stacking Ensemble model and each individual model was constructed. Because the results of the Shapiro-wilk normality test also showed that all Pair did not follow normality, Using the nonparametric method wilcoxon rank sum test, we checked whether the two model forecasts that make up the Pair showed statistically significant differences. The analysis showed that the forecasts of the Staging Ensemble model showed statistically significant differences from those of the MLP model and CNN model. In addition, this study can provide a methodology that allows existing credit rating agencies to apply machine learning-based bankruptcy risk prediction methodologies, given that traditional credit rating models can also be reflected as sub-models to calculate the final default probability. Also, the Stacking Ensemble techniques proposed in this study can help design to meet the requirements of the Financial Investment Business Regulations through the combination of various sub-models. We hope that this research will be used as a resource to increase practical use by overcoming and improving the limitations of existing machine learning-based models.

A Study on the Market Efficiency with Different Maturity in the Futures Markets (선물시장의 만기별 시장효율성에 관한 연구 - 베이시스간의 정보효과를 이용하여 -)

  • Seo, Sang-Gu;Park, Joung-Hae
    • Management & Information Systems Review
    • /
    • v.35 no.2
    • /
    • pp.273-284
    • /
    • 2016
  • The objective of this study is to analyze the market efficiency in the futures markets. Although many previous studies have investigated market efficiency between spot and futures prices, that with different maturities has not been studied in the futures markets extensively. For our objective, this paper examines KOSPI200 stock index future market with different maturities. We analyze the dynamic serial relationship of the difference of basis between nearest-month contract and next nearest-month contract using dynamic regression analysis suggested by Kawamoto and Hamori(2011) Using the data from 2000. 1 to 2013. 12, the major empirical findings are as follows: First. the mean and standard deviation of basis of next nearest-month contract is bigger than those of nearest-month contract. Second, the t-period basis of nearest-month contract can be explained by (t-1)period basis of that. Third, the basis spread of t-period and (t-1)period have negative affect on the return of underlying assets. This result is very reasonable because two basis spreads are derived from same underlying assets. Finally, basis information of next nearest-month contract can be used for the prediction of nearest-month contract and spot market return.

  • PDF

Assessment and Prediction of Stand Yield in Cryptomeria japonica Stands (삼나무 임분수확량 평가 및 예측)

  • Son, Yeong Mo;Kang, Jin Taek;Hwang, Jeong Sun;Park, Hyun;Lee, Kang Su
    • Journal of Korean Society of Forest Science
    • /
    • v.104 no.3
    • /
    • pp.421-426
    • /
    • 2015
  • The objective of this paper is to look into the growth of Cryptomeria japonica stand in South Korea along with the evaluation on their yields, followed by their carbon stocks and removals. A total of 106 sample plots were selected from Jeonnam, Gyeongnam, and Jeju, where the groups of standard are grown. We only used 92 plots data except outlier. As part of the analysis, the Weibull diameter distribution was applied. In order to estimate the diameter distribution, the growth estimation equation for each of the growth factors including the height, the diameter at breast height, and the basal area was drafted out and the verification for each equation was examined. The site index for figuring out the forest productivity of Cryptomeria japonica stand for each district was also developed as a Schumacher model and 30yr was used as a reference age for the estimation of the site index. It was found that the site index for Cryptomeria japonica stand in South Korea ranges from 10 to 16 and this result was used as a standard for developing the stand yield table. According to the site 14 in the stand yield table, the mean annual increment (MAI) of the Cryptomeria japonica reaches $7.6m^3/ha$ on its 25yr and its growing stock is estimated to be at $190.1m^3/ha$. This volume is about $20m^3$ as high as that of the Chamaesyparis obtusa. Furthermore, the annual carbon absorptions for a Cryptomeria japonica stand reached the peak at 25yr, which is 2.14 tC/ha/yr, $7.83tCO_2/ha/yr$. When compared to the other conifers, this rate is slightly higher than that of a Chamaecyparis obtusa ($7.5tCO_2/ha/yr$) but lower than that of the Pinus koraiensis ($10.4tCO_2/ha/yr$) and Larix kaempferi ($11.2tCO_2/ha/yr$). With such research result as a base, it is necessary to come up with the ways to enhance the utilization of Cryptomeria japonica as timbers, besides making use of their growth data.

Development of a Detection Model for the Companies Designated as Administrative Issue in KOSDAQ Market (KOSDAQ 시장의 관리종목 지정 탐지 모형 개발)

  • Shin, Dong-In;Kwahk, Kee-Young
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.157-176
    • /
    • 2018
  • The purpose of this research is to develop a detection model for companies designated as administrative issue in KOSDAQ market using financial data. Administration issue designates the companies with high potential for delisting, which gives them time to overcome the reasons for the delisting under certain restrictions of the Korean stock market. It acts as an alarm to inform investors and market participants of which companies are likely to be delisted and warns them to make safe investments. Despite this importance, there are relatively few studies on administration issues prediction model in comparison with the lots of studies on bankruptcy prediction model. Therefore, this study develops and verifies the detection model of the companies designated as administrative issue using financial data of KOSDAQ companies. In this study, logistic regression and decision tree are proposed as the data mining models for detecting administrative issues. According to the results of the analysis, the logistic regression model predicted the companies designated as administrative issue using three variables - ROE(Earnings before tax), Cash flows/Shareholder's equity, and Asset turnover ratio, and its overall accuracy was 86% for the validation dataset. The decision tree (Classification and Regression Trees, CART) model applied the classification rules using Cash flows/Total assets and ROA(Net income), and the overall accuracy reached 87%. Implications of the financial indictors selected in our logistic regression and decision tree models are as follows. First, ROE(Earnings before tax) in the logistic detection model shows the profit and loss of the business segment that will continue without including the revenue and expenses of the discontinued business. Therefore, the weakening of the variable means that the competitiveness of the core business is weakened. If a large part of the profits is generated from one-off profit, it is very likely that the deterioration of business management is further intensified. As the ROE of a KOSDAQ company decreases significantly, it is highly likely that the company can be delisted. Second, cash flows to shareholder's equity represents that the firm's ability to generate cash flow under the condition that the financial condition of the subsidiary company is excluded. In other words, the weakening of the management capacity of the parent company, excluding the subsidiary's competence, can be a main reason for the increase of the possibility of administrative issue designation. Third, low asset turnover ratio means that current assets and non-current assets are ineffectively used by corporation, or that asset investment by corporation is excessive. If the asset turnover ratio of a KOSDAQ-listed company decreases, it is necessary to examine in detail corporate activities from various perspectives such as weakening sales or increasing or decreasing inventories of company. Cash flow / total assets, a variable selected by the decision tree detection model, is a key indicator of the company's cash condition and its ability to generate cash from operating activities. Cash flow indicates whether a firm can perform its main activities(maintaining its operating ability, repaying debts, paying dividends and making new investments) without relying on external financial resources. Therefore, if the index of the variable is negative(-), it indicates the possibility that a company has serious problems in business activities. If the cash flow from operating activities of a specific company is smaller than the net profit, it means that the net profit has not been cashed, indicating that there is a serious problem in managing the trade receivables and inventory assets of the company. Therefore, it can be understood that as the cash flows / total assets decrease, the probability of administrative issue designation and the probability of delisting are increased. In summary, the logistic regression-based detection model in this study was found to be affected by the company's financial activities including ROE(Earnings before tax). However, decision tree-based detection model predicts the designation based on the cash flows of the company.

Development of a Distribution Prediction Model by Evaluating Environmental Suitability of the Aconitum austrokoreense Koidz. Habitat (세뿔투구꽃의 서식지 환경 적합성 평가를 통한 분포 예측 모형 개발)

  • Cho, Seon-Hee;Lee, Kye-Han
    • Journal of Korean Society of Forest Science
    • /
    • v.110 no.4
    • /
    • pp.504-515
    • /
    • 2021
  • To examine the relationship between environmental factors influencing the habitat of Aconitum austrokoreense Koidz., this study employed the MexEnt model to evaluate 21 environmental factors. Fourteen environmental factors having an AUC of at least 0.6 were found to be the age of stand, growing stock, altitude, topography, topographic wetness index, solar radiation, soil texture, mean temperature in January, mean temperature in April, mean annual temperature, mean rainfall in January, mean rainfall in August, and mean annual rainfall. Based on the response curves of the 14 descriptive factors, Aconitum austrokoreense Koidz. on the Baekun Mountain were deemed more suitable for sites at an altitude of 600 m or lower, and habitats were not significantly affected by the inclination angle. The preferred conditions were high stand density, sites close to valleys, and distribution in the northwestern direction. Under the five-age class system, the species were more likely to be observed for lower classes. The preferred solar radiation in this study was 1.2 MJ/m2. The species were less likely to be observed when the topographic wetness index fell below the reference value of 4.5, and were more likely observed above 7.5 (reference of threshold). Soil analysis showed that Aconitum austrokoreense Koidz. was more likely to thrive in sandy loam than clay. Suitable conditions were a mean January temperature of - 4.4℃ to -2.5℃, mean April temperature of 8.8℃-10.0℃, and mean annual temperature of 9.6℃-11.0℃. Aconitum austrokoreense Koidz. was first observed in sites with a mean annual rainfall of 1,670- 1,720 mm, and a mean August rainfall of at least 350 mm. Therefore, sites with increasing rainfall of up to 390 mm were preferred. The area of potential habitats having distributive significance of 75% or higher was 202 ha, or 1.8% of the area covered in this study.