• Title/Summary/Keyword: forecasting models

Search Result 1,014, Processing Time 0.027 seconds

Dynamic forecasts of bankruptcy with Recurrent Neural Network model (RNN(Recurrent Neural Network)을 이용한 기업부도예측모형에서 회계정보의 동적 변화 연구)

  • Kwon, Hyukkun;Lee, Dongkyu;Shin, Minsoo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.139-153
    • /
    • 2017
  • Corporate bankruptcy can cause great losses not only to stakeholders but also to many related sectors in society. Through the economic crises, bankruptcy have increased and bankruptcy prediction models have become more and more important. Therefore, corporate bankruptcy has been regarded as one of the major topics of research in business management. Also, many studies in the industry are in progress and important. Previous studies attempted to utilize various methodologies to improve the bankruptcy prediction accuracy and to resolve the overfitting problem, such as Multivariate Discriminant Analysis (MDA), Generalized Linear Model (GLM). These methods are based on statistics. Recently, researchers have used machine learning methodologies such as Support Vector Machine (SVM), Artificial Neural Network (ANN). Furthermore, fuzzy theory and genetic algorithms were used. Because of this change, many of bankruptcy models are developed. Also, performance has been improved. In general, the company's financial and accounting information will change over time. Likewise, the market situation also changes, so there are many difficulties in predicting bankruptcy only with information at a certain point in time. However, even though traditional research has problems that don't take into account the time effect, dynamic model has not been studied much. When we ignore the time effect, we get the biased results. So the static model may not be suitable for predicting bankruptcy. Thus, using the dynamic model, there is a possibility that bankruptcy prediction model is improved. In this paper, we propose RNN (Recurrent Neural Network) which is one of the deep learning methodologies. The RNN learns time series data and the performance is known to be good. Prior to experiment, we selected non-financial firms listed on the KOSPI, KOSDAQ and KONEX markets from 2010 to 2016 for the estimation of the bankruptcy prediction model and the comparison of forecasting performance. In order to prevent a mistake of predicting bankruptcy by using the financial information already reflected in the deterioration of the financial condition of the company, the financial information was collected with a lag of two years, and the default period was defined from January to December of the year. Then we defined the bankruptcy. The bankruptcy we defined is the abolition of the listing due to sluggish earnings. We confirmed abolition of the list at KIND that is corporate stock information website. Then we selected variables at previous papers. The first set of variables are Z-score variables. These variables have become traditional variables in predicting bankruptcy. The second set of variables are dynamic variable set. Finally we selected 240 normal companies and 226 bankrupt companies at the first variable set. Likewise, we selected 229 normal companies and 226 bankrupt companies at the second variable set. We created a model that reflects dynamic changes in time-series financial data and by comparing the suggested model with the analysis of existing bankruptcy predictive models, we found that the suggested model could help to improve the accuracy of bankruptcy predictions. We used financial data in KIS Value (Financial database) and selected Multivariate Discriminant Analysis (MDA), Generalized Linear Model called logistic regression (GLM), Support Vector Machine (SVM), Artificial Neural Network (ANN) model as benchmark. The result of the experiment proved that RNN's performance was better than comparative model. The accuracy of RNN was high in both sets of variables and the Area Under the Curve (AUC) value was also high. Also when we saw the hit-ratio table, the ratio of RNNs that predicted a poor company to be bankrupt was higher than that of other comparative models. However the limitation of this paper is that an overfitting problem occurs during RNN learning. But we expect to be able to solve the overfitting problem by selecting more learning data and appropriate variables. From these result, it is expected that this research will contribute to the development of a bankruptcy prediction by proposing a new dynamic model.

Estimation of the Korean Yield Curve via Bayesian Variable Selection (베이지안 변수선택을 이용한 한국 수익률곡선 추정)

  • Koo, Byungsoo
    • Economic Analysis
    • /
    • v.26 no.1
    • /
    • pp.84-132
    • /
    • 2020
  • A central bank infers market expectations of future yields based on yield curves. The central bank needs to precisely understand the changes in market expectations of future yields in order to have a more effective monetary policy. This need explains why a range of models have attempted to produce yield curves and market expectations that are as accurate as possible. Alongside the development of bond markets, the interconnectedness between them and macroeconomic factors has deepened, and this has rendered understanding of what macroeconomic variables affect yield curves even more important. However, the existence of various theories about determinants of yields inevitably means that previous studies have applied different macroeconomics variables when estimating yield curves. This indicates model uncertainties and naturally poses a question: Which model better estimates yield curves? Put differently, which variables should be applied to better estimate yield curves? This study employs the Dynamic Nelson-Siegel Model and takes the Bayesian approach to variable selection in order to ensure precision in estimating yield curves and market expectations of future yields. Bayesian variable selection may be an effective estimation method because it is expected to alleviate problems arising from a priori selection of the key variables comprising a model, and because it is a comprehensive approach that efficiently reflects model uncertainties in estimations. A comparison of Bayesian variable selection with the models of previous studies finds that the question of which macroeconomic variables are applied to a model has considerable impact on market expectations of future yields. This shows that model uncertainties exert great influence on the resultant estimates, and that it is reasonable to reflect model uncertainties in the estimation. Those implications are underscored by the superior forecasting performance of Bayesian variable selection models over those models used in previous studies. Therefore, the use of a Bayesian variable selection model is advisable in estimating yield curves and market expectations of yield curves with greater exactitude in consideration of the impact of model uncertainties on the estimation.

Corporate Default Prediction Model Using Deep Learning Time Series Algorithm, RNN and LSTM (딥러닝 시계열 알고리즘 적용한 기업부도예측모형 유용성 검증)

  • Cha, Sungjae;Kang, Jungseok
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.1-32
    • /
    • 2018
  • In addition to stakeholders including managers, employees, creditors, and investors of bankrupt companies, corporate defaults have a ripple effect on the local and national economy. Before the Asian financial crisis, the Korean government only analyzed SMEs and tried to improve the forecasting power of a default prediction model, rather than developing various corporate default models. As a result, even large corporations called 'chaebol enterprises' become bankrupt. Even after that, the analysis of past corporate defaults has been focused on specific variables, and when the government restructured immediately after the global financial crisis, they only focused on certain main variables such as 'debt ratio'. A multifaceted study of corporate default prediction models is essential to ensure diverse interests, to avoid situations like the 'Lehman Brothers Case' of the global financial crisis, to avoid total collapse in a single moment. The key variables used in corporate defaults vary over time. This is confirmed by Beaver (1967, 1968) and Altman's (1968) analysis that Deakins'(1972) study shows that the major factors affecting corporate failure have changed. In Grice's (2001) study, the importance of predictive variables was also found through Zmijewski's (1984) and Ohlson's (1980) models. However, the studies that have been carried out in the past use static models. Most of them do not consider the changes that occur in the course of time. Therefore, in order to construct consistent prediction models, it is necessary to compensate the time-dependent bias by means of a time series analysis algorithm reflecting dynamic change. Based on the global financial crisis, which has had a significant impact on Korea, this study is conducted using 10 years of annual corporate data from 2000 to 2009. Data are divided into training data, validation data, and test data respectively, and are divided into 7, 2, and 1 years respectively. In order to construct a consistent bankruptcy model in the flow of time change, we first train a time series deep learning algorithm model using the data before the financial crisis (2000~2006). The parameter tuning of the existing model and the deep learning time series algorithm is conducted with validation data including the financial crisis period (2007~2008). As a result, we construct a model that shows similar pattern to the results of the learning data and shows excellent prediction power. After that, each bankruptcy prediction model is restructured by integrating the learning data and validation data again (2000 ~ 2008), applying the optimal parameters as in the previous validation. Finally, each corporate default prediction model is evaluated and compared using test data (2009) based on the trained models over nine years. Then, the usefulness of the corporate default prediction model based on the deep learning time series algorithm is proved. In addition, by adding the Lasso regression analysis to the existing methods (multiple discriminant analysis, logit model) which select the variables, it is proved that the deep learning time series algorithm model based on the three bundles of variables is useful for robust corporate default prediction. The definition of bankruptcy used is the same as that of Lee (2015). Independent variables include financial information such as financial ratios used in previous studies. Multivariate discriminant analysis, logit model, and Lasso regression model are used to select the optimal variable group. The influence of the Multivariate discriminant analysis model proposed by Altman (1968), the Logit model proposed by Ohlson (1980), the non-time series machine learning algorithms, and the deep learning time series algorithms are compared. In the case of corporate data, there are limitations of 'nonlinear variables', 'multi-collinearity' of variables, and 'lack of data'. While the logit model is nonlinear, the Lasso regression model solves the multi-collinearity problem, and the deep learning time series algorithm using the variable data generation method complements the lack of data. Big Data Technology, a leading technology in the future, is moving from simple human analysis, to automated AI analysis, and finally towards future intertwined AI applications. Although the study of the corporate default prediction model using the time series algorithm is still in its early stages, deep learning algorithm is much faster than regression analysis at corporate default prediction modeling. Also, it is more effective on prediction power. Through the Fourth Industrial Revolution, the current government and other overseas governments are working hard to integrate the system in everyday life of their nation and society. Yet the field of deep learning time series research for the financial industry is still insufficient. This is an initial study on deep learning time series algorithm analysis of corporate defaults. Therefore it is hoped that it will be used as a comparative analysis data for non-specialists who start a study combining financial data and deep learning time series algorithm.

Comparison of Natural Flow Estimates for the Han River Basin Using TANK and SWAT Models (TANK 모형과 SWAT 모형을 이용한 한강유역의 자연유출량 산정 비교)

  • Kim, Chul-Gyum;Kim, Nam-Won
    • Journal of Korea Water Resources Association
    • /
    • v.45 no.3
    • /
    • pp.301-316
    • /
    • 2012
  • Two models, TANK and SWAT (Soil and Water Assessment Tool) were compared for simulating natural flows in the Paldang Dam upstream areas of the Han River basin in order to understand the limitations of TANK and to review the applicability and capability of SWAT. For comparison, simulation results from the previous research work were used. In the results for the calibrated watersheds (Chungju Dam and Soyanggang Dam), two models provided promising results for forecasting of daily flows with the Nash-Sutcliffe model efficiency of around 0.8. TANK simulated observations during some peak flood seasons better than SWAT, while it showed poor results during dry seasons, especially its simulations did not fall down under a certain value. It can be explained that TANK was calibrated for relatively larger flows than smaller ones. SWAT results showed a relatively good agreement with observed flows except some flood flows, and simulated inflows at the Paldang Dam considering discharges from upper dams coincided with observations with the model efficiency of around 0.9. This accounts for SWAT applicability with higher accuracy in predicting natural flows without dam operation or artificial water uses, and in assessing flow variations before and after dam development. Also, two model results were compared for other watersheds such as Pyeongchang-A, Dalcheon-B, Seomgang-B, Inbuk-A, Hangang-D, and Hongcheon-A to which calibrated TANK parameters were applied. The results were similar to the case of calibrated watersheds, that TANK simulated poor smaller flows except some flood flows and had same problem of keeping on over a certain value in dry seasons. This indicates that TANK application may have fatal uncertainties in estimating low flows used as an important index in water resources planning and management. Therefore, in order to reflect actually complex and complicated physical characteristics of Korean watersheds, and to manage efficiently water resources according to the land use and water use changes with urbanization or climate change in the future, it is necessary to utilize a physically based watershed model like SWAT rather than an existing conceptual lumped model like TANK.

A Study on the Development of a Simulation Model for Predicting Soil Moisture Content and Scheduling Irrigation (토양수분함량 예측 및 계획관개 모의 모형 개발에 관한 연구(I))

  • 김철회;고재군
    • Magazine of the Korean Society of Agricultural Engineers
    • /
    • v.19 no.1
    • /
    • pp.4279-4295
    • /
    • 1977
  • Two types of model were established in order to product the soil moisture content by which information on irrigation could be obtained. Model-I was to represent the soil moisture depletion and was established based on the concept of water balance in a given soil profile. Model-II was a mathematical model derived from the analysis of soil moisture variation curves which were drawn from the observed data. In establishing the Model-I, the method and procedure to estimate parameters for the determination of the variables such as evapotranspirations, effective rainfalls, and drainage amounts were discussed. Empirical equations representing soil moisture variation curves were derived from the observed data as the Model-II. The procedure for forecasting timing and amounts of irrigation under the given soil moisture content was discussed. The established models were checked by comparing the observed data with those predicted by the model. Obtained results are summarized as follows: 1. As a water balance model of a given soil profile, the soil moisture depletion D, could be represented as the equation(2). 2. Among the various empirical formulae for potential evapotranspiration (Etp), Penman's formula was best fit to the data observed with the evaporation pans and tanks in Suweon area. High degree of positive correlation between Penman's predicted data and observed data with a large evaporation pan was confirmed. and the regression enquation was Y=0.7436X+17.2918, where Y represents evaporation rate from large evaporation pan, in mm/10days, and X represents potential evapotranspiration rate estimated by use of Penman's formula. 3. Evapotranspiration, Et, could be estimated from the potential evapotranspiration, Etp, by introducing the consumptive use coefficient, Kc, which was repre sensed by the following relationship: Kc=Kco$.$Ka+Ks‥‥‥(Eq. 6) where Kco : crop coefficient Ka : coefficient depending on the soil moisture content Ks : correction coefficient a. Crop coefficient. Kco. Crop coefficients of barley, bean, and wheat for each growth stage were found to be dependent on the crop. b. Coefficient depending on the soil moisture content, Ka. The values of Ka for clay loam, sandy loam, and loamy sand revealed a similar tendency to those of Pierce type. c. Correction coefficent, Ks. Following relationships were established to estimate Ks values: Ks=Kc-Kco$.$Ka, where Ks=0 if Kc,=Kco$.$K0$\geq$1.0, otherwise Ks=1-Kco$.$Ka 4. Effective rainfall, Re, was estimated by using following relationships : Re=D, if R-D$\geq$0, otherwise, Re=R 5. The difference between rainfall, R, and the soil moisture depletion D, was taken as drainage amount, Wd. {{{{D= SUM from { {i }=1} to n (Et-Re-I+Wd)}}}} if Wd=0, otherwise, {{{{D= SUM from { {i }=tf} to n (Et-Re-I+Wd)}}}} where tf=2∼3 days. 6. The curves and their corresponding empirical equations for the variation of soil moisture depending on the soil types, soil depths are shown on Fig. 8 (a,b.c,d). The general mathematical model on soil moisture variation depending on seasons, weather, and soil types were as follow: {{{{SMC= SUM ( { C}_{i }Exp( { - lambda }_{i } { t}_{i } )+ { Re}_{i } - { Excess}_{i } )}}}} where SMC : soil moisture content C : constant depending on an initial soil moisture content $\lambda$ : constant depending on season t : time Re : effective rainfall Excess : drainage and excess soil moisture other than drainage. The values of $\lambda$ are shown on Table 1. 7. The timing and amount of irrigation could be predicted by the equation (9-a) and (9-b,c), respectively. 8. Under the given conditions, the model for scheduling irrigation was completed. Fig. 9 show computer flow charts of the model. a. To estimate a potential evapotranspiration, Penman's equation was used if a complete observed meteorological data were available, and Jensen-Haise's equation was used if a forecasted meteorological data were available, However none of the observed or forecasted data were available, the equation (15) was used. b. As an input time data, a crop carlender was used, which was made based on the time when the growth stage of the crop shows it's maximum effective leaf coverage. 9. For the purpose of validation of the models, observed data of soil moiture content under various conditions from May, 1975 to July, 1975 were compared to the data predicted by Model-I and Model-II. Model-I shows the relative error of 4.6 to 14.3 percent which is an acceptable range of error in view of engineering purpose. Model-II shows 3 to 16.7 percent of relative error which is a little larger than the one from the Model-I. 10. Comparing two models, the followings are concluded: Model-I established on the theoretical background can predict with a satisfiable reliability far practical use provided that forecasted meteorological data are available. On the other hand, Model-II was superior to Model-I in it's simplicity, but it needs long period and wide scope of observed data to predict acceptable soil moisture content. Further studies are needed on the Model-II to make it acceptable in practical use.

  • PDF

Export Prediction Using Separated Learning Method and Recommendation of Potential Export Countries (분리학습 모델을 이용한 수출액 예측 및 수출 유망국가 추천)

  • Jang, Yeongjin;Won, Jongkwan;Lee, Chaerok
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.69-88
    • /
    • 2022
  • One of the characteristics of South Korea's economic structure is that it is highly dependent on exports. Thus, many businesses are closely related to the global economy and diplomatic situation. In addition, small and medium-sized enterprises(SMEs) specialized in exporting are struggling due to the spread of COVID-19. Therefore, this study aimed to develop a model to forecast exports for next year to support SMEs' export strategy and decision making. Also, this study proposed a strategy to recommend promising export countries of each item based on the forecasting model. We analyzed important variables used in previous studies such as country-specific, item-specific, and macro-economic variables and collected those variables to train our prediction model. Next, through the exploratory data analysis(EDA) it was found that exports, which is a target variable, have a highly skewed distribution. To deal with this issue and improve predictive performance, we suggest a separated learning method. In a separated learning method, the whole dataset is divided into homogeneous subgroups and a prediction algorithm is applied to each group. Thus, characteristics of each group can be more precisely trained using different input variables and algorithms. In this study, we divided the dataset into five subgroups based on the exports to decrease skewness of the target variable. After the separation, we found that each group has different characteristics in countries and goods. For example, In Group 1, most of the exporting countries are developing countries and the majority of exporting goods are low value products such as glass and prints. On the other hand, major exporting countries of South Korea such as China, USA, and Vietnam are included in Group 4 and Group 5 and most exporting goods in these groups are high value products. Then we used LightGBM(LGBM) and Exponential Moving Average(EMA) for prediction. Considering the characteristics of each group, models were built using LGBM for Group 1 to 4 and EMA for Group 5. To evaluate the performance of the model, we compare different model structures and algorithms. As a result, it was found that the separated learning model had best performance compared to other models. After the model was built, we also provided variable importance of each group using SHAP-value to add explainability of our model. Based on the prediction model, we proposed a second-stage recommendation strategy for potential export countries. In the first phase, BCG matrix was used to find Star and Question Mark markets that are expected to grow rapidly. In the second phase, we calculated scores for each country and recommendations were made according to ranking. Using this recommendation framework, potential export countries were selected and information about those countries for each item was presented. There are several implications of this study. First of all, most of the preceding studies have conducted research on the specific situation or country. However, this study use various variables and develops a machine learning model for a wide range of countries and items. Second, as to our knowledge, it is the first attempt to adopt a separated learning method for exports prediction. By separating the dataset into 5 homogeneous subgroups, we could enhance the predictive performance of the model. Also, more detailed explanation of models by group is provided using SHAP values. Lastly, this study has several practical implications. There are some platforms which serve trade information including KOTRA, but most of them are based on past data. Therefore, it is not easy for companies to predict future trends. By utilizing the model and recommendation strategy in this research, trade related services in each platform can be improved so that companies including SMEs can fully utilize the service when making strategies and decisions for exports.

Long-term forecasting reference evapotranspiration using statistically predicted temperature information (통계적 기온예측정보를 활용한 기준증발산량 장기예측)

  • Kim, Chul-Gyum;Lee, Jeongwoo;Lee, Jeong Eun;Kim, Hyeonjun
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.12
    • /
    • pp.1243-1254
    • /
    • 2021
  • For water resources operation or agricultural water management, it is important to accurately predict evapotranspiration for a long-term future over a seasonal or monthly basis. In this study, reference evapotranspiration forecast (up to 12 months in advance) was performed using statistically predicted monthly temperatures and temperature-based Hamon method for the Han River basin. First, the daily maximum and minimum temperature data for 15 meterological stations in the basin were derived by spatial-temporal downscaling the monthly temperature forecasts. The results of goodness-of-fit test for the downscaled temperature data at each site showed that the percent bias (PBIAS) ranged from 1.3 to 6.9%, the ratio of the root mean square error to the standard deviation of the observations (RSR) ranged from 0.22 to 0.27, the Nash-Sutcliffe efficiency (NSE) ranged from 0.93 to 0.95, and the Pearson correlation coefficient (r) ranged from 0.97 to 0.98 for the monthly average daily maximum temperature. And for the monthly average daily minimum temperature, PBIAS was 7.8 to 44.7%, RSR was 0.21 to 0.25, NSE was 0.94 to 0.96, and r was 0.98 to 0.99. The difference by site was not large, and the downscaled results were similar to the observations. In the results of comparing the forecasted reference evapotranspiration calculated using the downscaled data with the observed values for the entire region, PBIAS was 2.2 to 5.4%, RSR was 0.21 to 0.28, NSE was 0.92 to 0.96, and r was 0.96 to 0.98, indicating a very high fit. Due to the characteristics of the statistical models and uncertainty in the downscaling process, the predicted reference evapotranspiration may slightly deviate from the observed value in some periods when temperatures completely different from the past are observed. However, considering that it is a forecast result for the future period, it will be sufficiently useful as information for the evaluation or operation of water resources in the future.

A Comparison between Simulation Results of DSSAT CROPGRO-SOYBEAN at US Cornbelt using Different Gridded Weather Forecast Data (격자기상예보자료 종류에 따른 미국 콘벨트 지역 DSSAT CROPGRO-SOYBEAN 모형 구동 결과 비교)

  • Yoo, Byoung Hyun;Kim, Kwang Soo;Hur, Jina;Song, Chan-Yeong;Ahn, Joong-Bae
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.24 no.3
    • /
    • pp.164-178
    • /
    • 2022
  • Uncertainties in weather forecasts would affect the reliability of yield prediction using crop models. The objective of this study was to compare uncertainty in crop yield prediction caused by the use of the weather forecast data. Daily weather data were produced at 10 km spatial resolution using W eather Research and Forecasting (W RF) model. The nearest neighbor method was used to downscale these data at the resolution of 5 km (W RF5K). Parameter-elevation Regressions on Independent Slopes Model (PRISM) was also applied to the WRF data to produce the weather data at the same resolution. W RF5K and PRISM data were used as inputs to the CROPGRO-SOYBEAN model to predict crop yield. The uncertainties of the gridded data were analyzed using cumulative growing degree days (CGDD) and cumulative solar radiation (CSRAD) during the soybean growing seasons for the crop of interest. The degree of agreement (DOA) statistics including structural similarity index were determined for the crop model outputs. Our results indicated that the DOA statistics for CGDD were correlated with that for the maturity dates predicted using WRF5K and PRISM data. Yield forecasts had small values of the DOA statistics when large spatial disagreement occured between maturity dates predicted using WRF5K and PRISM. These results suggest that the spatial uncertainties in temperature data would affect the reliability of the phenology and, as a result, yield predictions at a greater degree than those in solar radiation data. This merits further studies to assess the uncertainties of crop yield forecasts using a wide range of crop calendars.

Analysis and Forecast of Venture Capital Investment on Generative AI Startups: Focusing on the U.S. and South Korea (생성 AI 스타트업에 대한 벤처투자 분석과 예측: 미국과 한국을 중심으로)

  • Lee, Seungah;Jung, Taehyun
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.18 no.4
    • /
    • pp.21-35
    • /
    • 2023
  • Expectations surrounding generative AI technology and its profound ramifications are sweeping across various industrial domains. Given the anticipated pivotal role of the startup ecosystem in the utilization and advancement of generative AI technology, it is imperative to cultivate a deeper comprehension of the present state and distinctive attributes characterizing venture capital (VC) investments within this domain. The current investigation delves into South Korea's landscape of VC investment deals and prognosticates the projected VC investments by juxtaposing these against the United States, the frontrunner in the generative AI industry and its associated ecosystem. For analytical purposes, a compilation of 286 investment deals originating from 117 U.S. generative AI startups spanning the period from 2008 to 2023, as well as 144 investment deals from 42 South Korean generative AI startups covering the years 2011 to 2023, was amassed to construct new datasets. The outcomes of this endeavor reveal an upward trajectory in the count of VC investment deals within both the U.S. and South Korea during recent years. Predominantly, these deals have been concentrated within the early-stage investment realm. Noteworthy disparities between the two nations have also come to light. Specifically, in the U.S., in contrast to South Korea, the quantum of recent VC deals has escalated, marking an augmentation ranging from 285% to 488% in the corresponding developmental stage. While the interval between disparate investment stages demonstrated a slight elongation in South Korea relative to the U.S., this discrepancy did not achieve statistical significance. Furthermore, the proportion of VC investments channeled into generative AI enterprises, relative to the aggregate number of deals, exhibited a higher quotient in South Korea compared to the U.S. Upon a comprehensive sectoral breakdown of generative AI, it was discerned that within the U.S., 59.2% of total deals were concentrated in the text and model sectors, whereas in South Korea, 61.9% of deals centered around the video, image, and chat sectors. Through forecasting, the anticipated VC investments in South Korea from 2023 to 2029 were derived via four distinct models, culminating in an estimated average requirement of 3.4 trillion Korean won (ranging from at least 2.408 trillion won to a maximum of 5.919 trillion won). This research bears pragmatic significance as it methodically dissects VC investments within the generative AI domain across both the U.S. and South Korea, culminating in the presentation of an estimated VC investment projection for the latter. Furthermore, its academic significance lies in laying the groundwork for prospective scholarly inquiries by dissecting the current landscape of generative AI VC investments, a sphere that has hitherto remained void of rigorous academic investigation supported by empirical data. Additionally, the study introduces two innovative methodologies for the prediction of VC investment sums. Upon broader integration, application, and refinement of these methodologies within diverse academic explorations, they stand poised to enhance the prognosticative capacity pertaining to VC investment costs.

  • PDF

A Study on the Prediction Model of Stock Price Index Trend based on GA-MSVM that Simultaneously Optimizes Feature and Instance Selection (입력변수 및 학습사례 선정을 동시에 최적화하는 GA-MSVM 기반 주가지수 추세 예측 모형에 관한 연구)

  • Lee, Jong-sik;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.4
    • /
    • pp.147-168
    • /
    • 2017
  • There have been many studies on accurate stock market forecasting in academia for a long time, and now there are also various forecasting models using various techniques. Recently, many attempts have been made to predict the stock index using various machine learning methods including Deep Learning. Although the fundamental analysis and the technical analysis method are used for the analysis of the traditional stock investment transaction, the technical analysis method is more useful for the application of the short-term transaction prediction or statistical and mathematical techniques. Most of the studies that have been conducted using these technical indicators have studied the model of predicting stock prices by binary classification - rising or falling - of stock market fluctuations in the future market (usually next trading day). However, it is also true that this binary classification has many unfavorable aspects in predicting trends, identifying trading signals, or signaling portfolio rebalancing. In this study, we try to predict the stock index by expanding the stock index trend (upward trend, boxed, downward trend) to the multiple classification system in the existing binary index method. In order to solve this multi-classification problem, a technique such as Multinomial Logistic Regression Analysis (MLOGIT), Multiple Discriminant Analysis (MDA) or Artificial Neural Networks (ANN) we propose an optimization model using Genetic Algorithm as a wrapper for improving the performance of this model using Multi-classification Support Vector Machines (MSVM), which has proved to be superior in prediction performance. In particular, the proposed model named GA-MSVM is designed to maximize model performance by optimizing not only the kernel function parameters of MSVM, but also the optimal selection of input variables (feature selection) as well as instance selection. In order to verify the performance of the proposed model, we applied the proposed method to the real data. The results show that the proposed method is more effective than the conventional multivariate SVM, which has been known to show the best prediction performance up to now, as well as existing artificial intelligence / data mining techniques such as MDA, MLOGIT, CBR, and it is confirmed that the prediction performance is better than this. Especially, it has been confirmed that the 'instance selection' plays a very important role in predicting the stock index trend, and it is confirmed that the improvement effect of the model is more important than other factors. To verify the usefulness of GA-MSVM, we applied it to Korea's real KOSPI200 stock index trend forecast. Our research is primarily aimed at predicting trend segments to capture signal acquisition or short-term trend transition points. The experimental data set includes technical indicators such as the price and volatility index (2004 ~ 2017) and macroeconomic data (interest rate, exchange rate, S&P 500, etc.) of KOSPI200 stock index in Korea. Using a variety of statistical methods including one-way ANOVA and stepwise MDA, 15 indicators were selected as candidate independent variables. The dependent variable, trend classification, was classified into three states: 1 (upward trend), 0 (boxed), and -1 (downward trend). 70% of the total data for each class was used for training and the remaining 30% was used for verifying. To verify the performance of the proposed model, several comparative model experiments such as MDA, MLOGIT, CBR, ANN and MSVM were conducted. MSVM has adopted the One-Against-One (OAO) approach, which is known as the most accurate approach among the various MSVM approaches. Although there are some limitations, the final experimental results demonstrate that the proposed model, GA-MSVM, performs at a significantly higher level than all comparative models.