• Title/Summary/Keyword: Vector Decomposition

Search Result 245, Processing Time 0.019 seconds

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.29-45
    • /
    • 2012
  • Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.

The Determination Factor's Variation of Real Estate Price after Financial Crisis in Korea (2008년 금융위기 이후 부동산가격 결정요인 변화 분석)

  • Kim, Yong-Soon;Kwon, Chi-Hung;Lee, Kyung-Ae;Lee, Hyun-Rim
    • Land and Housing Review
    • /
    • v.2 no.4
    • /
    • pp.367-377
    • /
    • 2011
  • This paper investigates the determination factors' variation of real estate price after sub-prime financial crisis, in korea, using a VAR model. The model includes land price, housing price, housing rent (Jensei) price, which time period is from 2000:1Q to 2011:2Q and uses interest rate, real GDP, consumer price index, KOSPI, the number of housing construction, the amount of land sales and practices to impulse response and variance decomposition analysis. Data cover two sub-periods and divided by 2008:3Q that occurred the sub-prime crisis; one is a period of 2000:1Q to 2008:3Q, the other is based a period of 2000:1Q to 2011:2Q. As a result, Comparing sub-prime crisis before and after, land price come out that the influence of real GDP is expanding, but current interest rate's variation is weaken due to the stagnation of current economic status and housing construction market. Housing price is few influenced to interest rate and real GDP, but it is influenced its own variation or Jensei price's variation. According to the Jensei price's rapidly increasing in nowadays, housing price might be increasing a rising possibility. Jensei price is also weaken the influence of all economic index, housing price, comparing before sub-prime financial crisis and it is influenced its own variation the same housing price. As you know, real estate price is weakened market basic value factors such as, interest rate, real GDP, because it is influenced exogenous economic factors such as population structural changes. Economic participators, economic officials, consumer, construction supplyers need to access an accurate observation about current real estate market and economic status.

Factor Analysis Affecting on Changes in Handysize Freight Index and Spot Trip Charterage (핸디사이즈 운임지수 및 스팟용선료 변화에 영향을 미치는 요인 분석)

  • Lee, Choong-Ho;Kim, Tae-Woo;Park, Keun-Sik
    • Journal of Korea Port Economic Association
    • /
    • v.37 no.2
    • /
    • pp.73-89
    • /
    • 2021
  • The handysize bulk carriers are capable of transporting a variety of cargo that cannot be transported by mid-large size ship, and the spot chartering market is active, and it is a market that is independent of mid-large size market, and is more risky due to market conditions and charterage variability. In this study, Granger causality test, the Impulse Response Function(IRF) and Forecast Error Variance Decomposition(FEVD) were performed using monthly time series data. As a result of Granger causality test, coal price for coke making, Japan steel plate commodity price, hot rolled steel sheet price, fleet volume and bunker price have causality to Baltic Handysize Index(BHSI) and charterage. After confirming the appropriate lag and stability of the Vector Autoregressive model(VAR), IRF and FEVD were analyzed. As a result of IRF, the three variables of coal price for coke making, hot rolled steel sheet price and bunker price were found to have significant at both upper and lower limit of the confidence interval. Among them, the impulse of hot rolled steel sheet price was found to have the most significant effect. As a result of FEVD, the explanatory power that affects BHSI and charterage is the same in the order of hot rolled steel sheet price, coal price for coke making, bunker price, Japan steel plate price, and fleet volume. It was found that it gradually increased, affecting BHSI by 30% and charterage by 26%. In order to differentiate from previous studies and to find out the effect of short term lag, analysis was performed using monthly price data of major cargoes for Handysize bulk carriers, and meaningful results were derived that can predict monthly market conditions. This study can be helpful in predicting the short term market conditions for shipping companies that operate Handysize bulk carriers and concerned parties in the handysize chartering market.

A Study on the Effects of Export Insurance on the Exports of SMEs and Conglomerates (수출보험이 국내 중소기업 및 대기업의 수출에 미치는 영향에 관한 연구)

  • Lee, Dong-Joo
    • Korea Trade Review
    • /
    • v.42 no.2
    • /
    • pp.145-174
    • /
    • 2017
  • Recently, due to the worsening global economic recession, Korea which is a small, export-oriented economy has decreased exports and the domestic economy also continues to stagnate. Therefore, for continued growth of our economy through export growth, we need to analyze the validity of export support system such as export insurance and prepare ways to expand exports. This study is to investigate the effects of Export Insurance on the exports of SMEs as well as LEs. For this purpose, this study conducted Time Series Analysis using data such as export, export insurance acquisition, export price index, exchange rate, and coincident composite index(CCI). First, as a result of the Granger Causality Test, the exports of LEs has found to have a causal relationship with the CCI, and CCI is to have a causal relationship with the short-term export insurance record. Second, the results of VAR analysis show that the export insurance acquisition result and the export price index have a positive effect on the exports of LEs, while the short - term export insurance has a negative effect on the exports of LEs. Third, as a result of variance decomposition, the export of LEs has much more influenced for mid to long term by the short-term export insurance acquisition compared to SMEs. Fourth, short-term export insurance has a positive effect on exports of SMEs. In order to activate short-term export insurance against SMEs, it is necessary to expand support for SMEs by local governments. This study aims to suggest policy implications for establishing effective export insurance policy by analyzing the effects of export insurance on the export of SMEs as well as LEs. It is necessary to carry out a time series analysis on the export results according to the insurance acquisition results by industry to measure the export support effect of export insurance more precisely.

  • PDF

Korean Sentence Generation Using Phoneme-Level LSTM Language Model (한국어 음소 단위 LSTM 언어모델을 이용한 문장 생성)

  • Ahn, SungMahn;Chung, Yeojin;Lee, Jaejoon;Yang, Jiheon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.71-88
    • /
    • 2017
  • Language models were originally developed for speech recognition and language processing. Using a set of example sentences, a language model predicts the next word or character based on sequential input data. N-gram models have been widely used but this model cannot model the correlation between the input units efficiently since it is a probabilistic model which are based on the frequency of each unit in the training set. Recently, as the deep learning algorithm has been developed, a recurrent neural network (RNN) model and a long short-term memory (LSTM) model have been widely used for the neural language model (Ahn, 2016; Kim et al., 2016; Lee et al., 2016). These models can reflect dependency between the objects that are entered sequentially into the model (Gers and Schmidhuber, 2001; Mikolov et al., 2010; Sundermeyer et al., 2012). In order to learning the neural language model, texts need to be decomposed into words or morphemes. Since, however, a training set of sentences includes a huge number of words or morphemes in general, the size of dictionary is very large and so it increases model complexity. In addition, word-level or morpheme-level models are able to generate vocabularies only which are contained in the training set. Furthermore, with highly morphological languages such as Turkish, Hungarian, Russian, Finnish or Korean, morpheme analyzers have more chance to cause errors in decomposition process (Lankinen et al., 2016). Therefore, this paper proposes a phoneme-level language model for Korean language based on LSTM models. A phoneme such as a vowel or a consonant is the smallest unit that comprises Korean texts. We construct the language model using three or four LSTM layers. Each model was trained using Stochastic Gradient Algorithm and more advanced optimization algorithms such as Adagrad, RMSprop, Adadelta, Adam, Adamax, and Nadam. Simulation study was done with Old Testament texts using a deep learning package Keras based the Theano. After pre-processing the texts, the dataset included 74 of unique characters including vowels, consonants, and punctuation marks. Then we constructed an input vector with 20 consecutive characters and an output with a following 21st character. Finally, total 1,023,411 sets of input-output vectors were included in the dataset and we divided them into training, validation, testsets with proportion 70:15:15. All the simulation were conducted on a system equipped with an Intel Xeon CPU (16 cores) and a NVIDIA GeForce GTX 1080 GPU. We compared the loss function evaluated for the validation set, the perplexity evaluated for the test set, and the time to be taken for training each model. As a result, all the optimization algorithms but the stochastic gradient algorithm showed similar validation loss and perplexity, which are clearly superior to those of the stochastic gradient algorithm. The stochastic gradient algorithm took the longest time to be trained for both 3- and 4-LSTM models. On average, the 4-LSTM layer model took 69% longer training time than the 3-LSTM layer model. However, the validation loss and perplexity were not improved significantly or became even worse for specific conditions. On the other hand, when comparing the automatically generated sentences, the 4-LSTM layer model tended to generate the sentences which are closer to the natural language than the 3-LSTM model. Although there were slight differences in the completeness of the generated sentences between the models, the sentence generation performance was quite satisfactory in any simulation conditions: they generated only legitimate Korean letters and the use of postposition and the conjugation of verbs were almost perfect in the sense of grammar. The results of this study are expected to be widely used for the processing of Korean language in the field of language processing and speech recognition, which are the basis of artificial intelligence systems.