• Title/Summary/Keyword: Data driven method

Search Result 514, Processing Time 0.022 seconds

Optimization of Multiclass Support Vector Machine using Genetic Algorithm: Application to the Prediction of Corporate Credit Rating (유전자 알고리즘을 이용한 다분류 SVM의 최적화: 기업신용등급 예측에의 응용)

  • Ahn, Hyunchul
    • Information Systems Review
    • /
    • v.16 no.3
    • /
    • pp.161-177
    • /
    • 2014
  • Corporate credit rating assessment consists of complicated processes in which various factors describing a company are taken into consideration. Such assessment is known to be very expensive since domain experts should be employed to assess the ratings. As a result, the data-driven corporate credit rating prediction using statistical and artificial intelligence (AI) techniques has received considerable attention from researchers and practitioners. In particular, statistical methods such as multiple discriminant analysis (MDA) and multinomial logistic regression analysis (MLOGIT), and AI methods including case-based reasoning (CBR), artificial neural network (ANN), and multiclass support vector machine (MSVM) have been applied to corporate credit rating.2) Among them, MSVM has recently become popular because of its robustness and high prediction accuracy. In this study, we propose a novel optimized MSVM model, and appy it to corporate credit rating prediction in order to enhance the accuracy. Our model, named 'GAMSVM (Genetic Algorithm-optimized Multiclass Support Vector Machine),' is designed to simultaneously optimize the kernel parameters and the feature subset selection. Prior studies like Lorena and de Carvalho (2008), and Chatterjee (2013) show that proper kernel parameters may improve the performance of MSVMs. Also, the results from the studies such as Shieh and Yang (2008) and Chatterjee (2013) imply that appropriate feature selection may lead to higher prediction accuracy. Based on these prior studies, we propose to apply GAMSVM to corporate credit rating prediction. As a tool for optimizing the kernel parameters and the feature subset selection, we suggest genetic algorithm (GA). GA is known as an efficient and effective search method that attempts to simulate the biological evolution phenomenon. By applying genetic operations such as selection, crossover, and mutation, it is designed to gradually improve the search results. Especially, mutation operator prevents GA from falling into the local optima, thus we can find the globally optimal or near-optimal solution using it. GA has popularly been applied to search optimal parameters or feature subset selections of AI techniques including MSVM. With these reasons, we also adopt GA as an optimization tool. To empirically validate the usefulness of GAMSVM, we applied it to a real-world case of credit rating in Korea. Our application is in bond rating, which is the most frequently studied area of credit rating for specific debt issues or other financial obligations. The experimental dataset was collected from a large credit rating company in South Korea. It contained 39 financial ratios of 1,295 companies in the manufacturing industry, and their credit ratings. Using various statistical methods including the one-way ANOVA and the stepwise MDA, we selected 14 financial ratios as the candidate independent variables. The dependent variable, i.e. credit rating, was labeled as four classes: 1(A1); 2(A2); 3(A3); 4(B and C). 80 percent of total data for each class was used for training, and remaining 20 percent was used for validation. And, to overcome small sample size, we applied five-fold cross validation to our dataset. In order to examine the competitiveness of the proposed model, we also experimented several comparative models including MDA, MLOGIT, CBR, ANN and MSVM. In case of MSVM, we adopted One-Against-One (OAO) and DAGSVM (Directed Acyclic Graph SVM) approaches because they are known to be the most accurate approaches among various MSVM approaches. GAMSVM was implemented using LIBSVM-an open-source software, and Evolver 5.5-a commercial software enables GA. Other comparative models were experimented using various statistical and AI packages such as SPSS for Windows, Neuroshell, and Microsoft Excel VBA (Visual Basic for Applications). Experimental results showed that the proposed model-GAMSVM-outperformed all the competitive models. In addition, the model was found to use less independent variables, but to show higher accuracy. In our experiments, five variables such as X7 (total debt), X9 (sales per employee), X13 (years after founded), X15 (accumulated earning to total asset), and X39 (the index related to the cash flows from operating activity) were found to be the most important factors in predicting the corporate credit ratings. However, the values of the finally selected kernel parameters were found to be almost same among the data subsets. To examine whether the predictive performance of GAMSVM was significantly greater than those of other models, we used the McNemar test. As a result, we found that GAMSVM was better than MDA, MLOGIT, CBR, and ANN at the 1% significance level, and better than OAO and DAGSVM at the 5% significance level.

Numerical Simulation of the Formation of Oxygen Deficient Water-masses in Jinhae Bay (진해만의 빈산소 수괴 형성에 관한 수치실험)

  • CHOI Woo-Jeung;PARK Chung-Kill;LEE Suk-Mo
    • Korean Journal of Fisheries and Aquatic Sciences
    • /
    • v.27 no.4
    • /
    • pp.413-433
    • /
    • 1994
  • Jinhae Bay once was a productive area of fisheries. It is, however, now notorious for its red tides; and oxygen deficient water-masses extensively develop at present in summer. Therefore the shellfish production of the bay has been decreasing and mass mortality often occurs. Under these circumstances, the three-dimensional numerical hydrodynamic and the material cycle models, which were developed by the Institute for Resources and Environment of Japan, were applied to analyze the processes affecting the oxygen depletion and also to evaluate the environment capacity for the reception of pollutant loads without dissolved oxygen depletion. In field surveys, oxygen deficient water-masses were formed with concentrations of below 2.0mg/l at the bottom layer in Masan Bay and the western part of Jinhae Bay during the summer. Current directions, computed by the $M_2$ constituent, were mainly toward the western part of Jinhae Bay during flood flows and in opposite directions during ebb flows. Tidal currents velocities during the ebb tide were stronger than that of the flood tide. The comparision between the simulated and observed tidal ellipses showed fairly good agreement. The residual currents, which were obtained by averaging the simulated tidal currents over 1 tidal cycle, showed the presence of counterclockwise eddies in the central part of Jinhae Bay. Density driven currents were generated southward at surface and northward at the bottom in Masan Bay and Jindong Bay, where the fresh water of rivers entered. The material cycle model was calibrated with the data surveyed in the field of the study area from June to July, 1992. The calibrated results are in fairly good agreement with measured values within relative error of $28\%$. The simulated dissolved oxygen distributions of bottom layer were relatively high with the concentration of $6.0{\sim}8.0mg/l$ at the boundaries, but an oxygen deficient water-masses were formed within the concentration of 2.0mg/l at the inner part of Masan Bay and the western part of Jinhae Bay. The results of sensitivity analyses showed that sediment oxygen demand(SOD) was one of the most important influence on the formation of oxygen depletion. Therefore, to control the oxygen deficient water-masses and to conserve the coastal environment, it is an effective method to reduce the SOD by improving the polluted sediment. As the results of simulations, in Masan Bay, oxygen deficient water-masses recovered to 5.0mg/l when the $50\%$ reduction in input COD loads from Masan basin and $70\%$ reduction in SOD was conducted. In the western part of Jinhae Bay, oxygen deficient water-masses recovered to 5.0mg/l when the $95\%$ reduction in SOD and $90\%$ reduction in culturing ground fecal loads was conducted.

  • PDF

Water Balance Projection Using Climate Change Scenarios in the Korean Peninsula (기후변화 시나리오를 활용한 미래 한반도 물수급 전망)

  • Kim, Cho-Rong;Kim, Young-Oh;Seo, Seung Beom;Choi, Su-Woong
    • Journal of Korea Water Resources Association
    • /
    • v.46 no.8
    • /
    • pp.807-819
    • /
    • 2013
  • This study proposes a new methodology for future water balance projection considering climate change by assigning a weight to each scenario instead of inputting future streamflows based on GCMs into a water balance model directly. K-nearest neighbor algorithm was employed to assign weights and streamflows in non-flood period (October to the following June) was selected as the criterion for assigning weights. GCM-driven precipitation was input to TANK model to simulate future streamflow scenarios and Quantile Mapping was applied to correct bias between GCM hindcast and historical data. Based on these bias-corrected streamflows, different weights were assigned to each streamflow scenarios to calculate water shortage for the projection periods; 2020s (2010~2039), 2050s (2040~2069), and 2080s (2070~2099). As a result by applying the proposed methodology to project water shortage over the Korean Peninsula, average water shortage for 2020s is projected to increase to 10~32% comparing to the basis (1967~2003). In addition, according to getting decreased in streamflows in non-flood period gradually by 2080s, average water shortage for 2080s is projected to increase up to 97% (516.5 million $m^3/yr$) as maximum comparing to the basis. While the existing research on climate change gives radical increase in future water shortage, the results projected by the weighting method shows conservative change. This study has significance in the applicability of water balance projection regarding climate change, keeping the existing framework of national water resources planning and this lessens the confusion for decision-makers in water sectors.

Accuracy of HF radar-derived surface current data in the coastal waters off the Keum River estuary (금강하구 연안역에서 HF radar로 측정한 유속의 정확도)

  • Lee, S.H.;Moon, H.B.;Baek, H.Y.;Kim, C.S.;Son, Y.T.;Kwon, H.K.;Choi, B.J.
    • The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
    • /
    • v.13 no.1
    • /
    • pp.42-55
    • /
    • 2008
  • To evaluate the accuracy of currents measured by HF radar in the coastal sea off Keum River estuary, we compared the facing radial vectors of two HF radars, and HF radar-derived currents with in-situ measurement currents. Principal component analysis was used to extract regression line and RMS deviation in the comparison. When two facing radar's radial vectors at the mid-point of baseline are compared, RMS deviation is 4.4 cm/s in winter and 5.4 cm/s in summer. When GDOP(Geometric Dilution of Precision) effect is corrected from the RMS deviations that is analyzed from the comparison between HF radar-derived and current-metermeasured currents, the error of velocity combined by HF radar-derived current is less than 5.1 cm/s in the stations having moderate GDOP values. These two results obtained from different method suggest that the lower limit of HF radar-derived current's accuracy is 5.4 cm/s in our study area. As mentioned in previous researches, RMS deviations become large in the stations located near the islands and increase as a function of mean distance from the radar site due to decrease of signal-to-noise level and the intersect angle of radial vectors. We found that an uncertain error bound of HF radar-derived current can be produced from the separation process of RMS deviations using GDOP value if GDOP value for each component is very close and RMS deviations obtained from current component comparison are also close. When the current measured in the stations having moderate GDOP values is separated into tidal and subtidal current, characteristics of tidal current ellipses analyzed from HF radar-derived current show a good agreement with those from current-meter-measured current, and time variation of subtidal current showed a response reflecting physical process driven by wind and density field.