• Title/Summary/Keyword: problem decomposition

Search Result 594, Processing Time 0.436 seconds

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.29-45
    • /
    • 2012
  • Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.

A Study of Anomaly Detection for ICT Infrastructure using Conditional Multimodal Autoencoder (ICT 인프라 이상탐지를 위한 조건부 멀티모달 오토인코더에 관한 연구)

  • Shin, Byungjin;Lee, Jonghoon;Han, Sangjin;Park, Choong-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.57-73
    • /
    • 2021
  • Maintenance and prevention of failure through anomaly detection of ICT infrastructure is becoming important. System monitoring data is multidimensional time series data. When we deal with multidimensional time series data, we have difficulty in considering both characteristics of multidimensional data and characteristics of time series data. When dealing with multidimensional data, correlation between variables should be considered. Existing methods such as probability and linear base, distance base, etc. are degraded due to limitations called the curse of dimensions. In addition, time series data is preprocessed by applying sliding window technique and time series decomposition for self-correlation analysis. These techniques are the cause of increasing the dimension of data, so it is necessary to supplement them. The anomaly detection field is an old research field, and statistical methods and regression analysis were used in the early days. Currently, there are active studies to apply machine learning and artificial neural network technology to this field. Statistically based methods are difficult to apply when data is non-homogeneous, and do not detect local outliers well. The regression analysis method compares the predictive value and the actual value after learning the regression formula based on the parametric statistics and it detects abnormality. Anomaly detection using regression analysis has the disadvantage that the performance is lowered when the model is not solid and the noise or outliers of the data are included. There is a restriction that learning data with noise or outliers should be used. The autoencoder using artificial neural networks is learned to output as similar as possible to input data. It has many advantages compared to existing probability and linear model, cluster analysis, and map learning. It can be applied to data that does not satisfy probability distribution or linear assumption. In addition, it is possible to learn non-mapping without label data for teaching. However, there is a limitation of local outlier identification of multidimensional data in anomaly detection, and there is a problem that the dimension of data is greatly increased due to the characteristics of time series data. In this study, we propose a CMAE (Conditional Multimodal Autoencoder) that enhances the performance of anomaly detection by considering local outliers and time series characteristics. First, we applied Multimodal Autoencoder (MAE) to improve the limitations of local outlier identification of multidimensional data. Multimodals are commonly used to learn different types of inputs, such as voice and image. The different modal shares the bottleneck effect of Autoencoder and it learns correlation. In addition, CAE (Conditional Autoencoder) was used to learn the characteristics of time series data effectively without increasing the dimension of data. In general, conditional input mainly uses category variables, but in this study, time was used as a condition to learn periodicity. The CMAE model proposed in this paper was verified by comparing with the Unimodal Autoencoder (UAE) and Multi-modal Autoencoder (MAE). The restoration performance of Autoencoder for 41 variables was confirmed in the proposed model and the comparison model. The restoration performance is different by variables, and the restoration is normally well operated because the loss value is small for Memory, Disk, and Network modals in all three Autoencoder models. The process modal did not show a significant difference in all three models, and the CPU modal showed excellent performance in CMAE. ROC curve was prepared for the evaluation of anomaly detection performance in the proposed model and the comparison model, and AUC, accuracy, precision, recall, and F1-score were compared. In all indicators, the performance was shown in the order of CMAE, MAE, and AE. Especially, the reproduction rate was 0.9828 for CMAE, which can be confirmed to detect almost most of the abnormalities. The accuracy of the model was also improved and 87.12%, and the F1-score was 0.8883, which is considered to be suitable for anomaly detection. In practical aspect, the proposed model has an additional advantage in addition to performance improvement. The use of techniques such as time series decomposition and sliding windows has the disadvantage of managing unnecessary procedures; and their dimensional increase can cause a decrease in the computational speed in inference.The proposed model has characteristics that are easy to apply to practical tasks such as inference speed and model management.

Innovative Technology of Landfill Stabilization Combining Leachate Recirculation with Shortcut Biological Nitrogen Removal Technology (침출수 재순환과 생물학적 단축질소제거공정을 병합한 매립지 조기안정화 기술 연구)

  • Shin, Eon-Bin;Chung, Jin-Wook;Bae, Woo-Keun;Kim, Seung-Jin;Baek, Seung-Cheon
    • Journal of Korean Society of Environmental Engineers
    • /
    • v.29 no.9
    • /
    • pp.1035-1043
    • /
    • 2007
  • A leachate containing an elevated concentration of organic and inorganic compounds has the potential to contaminate adjacent soils and groundwater as well as downgradient areas of the watershed. Moreover high-strength ammonium concentrations in leachate can be toxic to aquatic ecological systems as well as consuming dissolved oxygen, due to ammonium oxidation, and thereby causing eutrophication of the watershed. In response to these concerns landfill stabilization and leachate treatment are required to reduce contaminant loading sand minimize effects on the environment. Compared with other treatment technologies, leachate recirculation technology is most effective for the pre-treatment of leachate and the acceleration of waste stabilization processes in a landfill. However, leachate recirculation that accelerates the decomposition of readily degradable organic matter might also be generating high-strength ammonium in the leachate. Since most landfill leachate having high concentrations of nitrogen also contain insufficient quantities of the organic carbon required for complete denitrification, we combined a shortcut biological nitrogen removal (SBNR) technology in order to solve the problem associated with the inability to denitrify the oxidized ammonium due to the lack of carbon sources. The accumulation of nitrite was successfully achieved at a 0.8 ratio of $NO_2^{-}-N/NO_x-N$ in an on-site reactor of the sequencing batch reactor (SBR) type that had operated for six hours in an aeration phase. The $NO_x$-N ratio in leachate produced following SBR treatment was reduced in the landfill and the denitrification mechanism is implied sulfur-based autotrophic denitrification and/or heterotrophic denitrification. The combined leachate recirculation with SBNR proved an effective technology for landfill stabilization and nitrogen removal in leachate.

A study on the improvement of distribution system by overseas agricultural investment (해외농업투자에 따른 유통체계 개선방안에 관한 연구)

  • Sun, Il-Suck;Lee, Dong-Ok
    • Journal of Distribution Science
    • /
    • v.8 no.3
    • /
    • pp.17-26
    • /
    • 2010
  • Recently concerns have been raised due to the unbalanced supply of crops: the price of crops has been unstable and at one point the price went up so high that the word Agflation(agriculture+ inflation) was coined. Korea, in particular, is a small-sized country and needs to secure the stable supply of crops by investing in the produce importation at a national level. Investment in foreign produce importation is becoming more important as a measure for sufficient supply of crops, limited supply of domestic crops, weakened farming conditions worldwide, as well as recent changes in the use of crops due to the development of bio-fuels, influence of carbon emission on crops, the price increase in crops, and influx of foreign hot money. However, there are many problems with investing in foreign produce importation: lack of support from the government; lack of farming information and technology; difficulty in securing the capital; no immediate pay-off from the investment and insufficient management. Although foreign produce is originally more price-competitive than domestic produce, it loses its competiveness in the process of importation (due to high tariffs) and poor distribution system, which makes it difficult to sell in Korea. Therefore, investment in foreign produce importation is being questioned for feasibility; to make it possible, foreign produce must maintain the price-competitiveness. Especially, harvest of agricultural products depends on natural and geographical conditions of each country and those products have indigenous properties, so distribution system according to import and export of agricultural products should be treated more carefully than that of other industries. Distribution costs are differentiated into each item and include cost of sorting and wrapping, cost of wrapping materials, cost of domestic transport, cost of international transport and cost of clearing customs for import and export. So transporting and storing agricultural products generates considerable costs compared with other products. Also, due to upgrade of dietary life, needs for stability, taste and visible quality toward food including agricultural products are being raised and wrong way of storage causes decomposition of food and loss of freshness, making the storage more difficult than that in room temperature, so storage and transport in distribution of agricultural products needs specialty. In addition, because lack of specialty in distribution and circulation such as storage and wrapping does not solve limit factors in distance, the distribution and circulation has been limited to a form of import and export within short-distant region. Therefore, need for distribution out-sourcing which can satisfy specialty in managing distribution and circulation and it is needed to establish more effective distribution system. However, existing distribution system of agricultural products is exposed to various problems including problems in distribution channel, making distribution and strategy for distribution and those problems are as follows. First, in case of investment in overseas agricultural industry, stable supply of the products is difficult because areas of production are dispersed widely and influenced by outer factors due to including overseas distribution channels. Also, at the aspect of quality, standardization of products is difficult, distribution system is quite complicated and unreasonable due to long distribution channels according to international trade and financial and institutional support is not enough. Especially, there are quite a lot of ineffective factors including multi level distribution process, dramatic gap between production cost and customer's cost, lack of physical distribution facilities and difficulties in storage and transport due to lack of wrapping containers. Besides, because import and export of agricultural products has been manages under the company's own distribution according to transaction contract between manufacturers and exporting company, efficiency is low due to excessive investment in fixed costs and lack of specialty in dealing with agricultural products causes fall of value of products, showing the limit to lose price-competitiveness. Especially, because lack of specialty in distribution and circulation such as storage and wrapping does not solve limit factors in distance, the distribution and circulation has been limited to a form of import and export within short-distant region. Therefore, need for distribution out-sourcing which can satisfy specialty in managing distribution and circulation and it is needed to establish more effective distribution system. Second, among tangible and intangible services which promote the efficiency of the whole distribution, a function building distribution environment which includes distribution information, system for standard and inspection, distribution finance, system for diversification of risks, education and training, distribution administration and tax system is wanted. In general, such a function building distribution environment is difficult to be changed and supplement innovatively because its effect compared with investment does not appear immediately despite of its necessity. Especially, in case of distribution of agricultural products, as a function of collecting and distributing is performed individually through various channels, the importance of distribution information and standardization is getting more focus due to the problem of repetition of work and lack of specialty. Also, efficient management of distribution is quite difficult due to lack of professionals in distribution, so support to professional education is needed. Third, though effort to keep self-sufficiency ratio of staple food, rice is regarded as important at the government level, level of dependency on overseas of others crops is high. Therefore, plan for stable securing food resources aside from staple food is also necessary. Especially, governmental organizations of agricultural products distribution in Korea are production-centered and have unreasonable structure whose function at the aspect of distribution and consumption is quite insufficient. And development of new distribution channels which can deal with changes in distribution environment and they do not achieve actual results of strategy for distribution due to non-positive strategy for price distribution. That is, it implies the possibility that base for supply will become vulnerable because it does not mediate appropriate interests on total distribution channels such as manufacturers, wholesale dealers and vendors by emphasizing consumer protection excessively in the distribution of agricultural products. Therefore, this study examined fundamental concept and actual situation for our investment to overseas agriculture, drew necessities, considerations, problems, etc. of overseas agricultural investment and suggested improvements at the level of distribution for price competitiveness of agricultural products cultivated in overseas under five aspects; government's indirect support, distribution's modernization and distribution information function's strengthening, government's political support for distribution facility, transportation route, load and unloading works' improvement, price competitiveness' securing, professional manpower's cultivation by education and training, etc. Here are some suggestions for foreign produce importation. First, the government should conduct a survey on the current distribution channels and analyze the situation to establish a measure for long-term development plans. By providing each agricultural area with a guideline for planning appropriate production of crops, the government can help farmers be ready for importation, and prevent them from producing same crops all at the same time. Government can sign an MOU with the foreign government and promote the importation so that the development of agricultural resources can be stable and steady. Second, the government can establish a strategy for an effective distribution system by providing farmers and agriculture-related workers with the distribution information such as price, production, demand, market structure and location, feature of each crop, and etc. In order for such distribution system to become feasible, the government needs to reconstruct the current distribution system, designate a public organization for providing distribution information and set the criteria for level of produce quality, trade units, and package units. Third, the government should provide financial support and a policy to seek an efficient distribution channel for foreign produce to be delivered fresh: the government should expand distribution facilities (for selecting, packaging, storing, and processing) and transportation vehicles while modernizing old facilities. There should be another policy to improve the efficiency of unloading, and to lower the cost of distribution. Fourth, it is necessary to enact a new law covering exceptional cases for importing produce in order to maintain the price competitiveness; currently the high tariffs is keeping the imported produce from being distributed domestically. However, the new adjustment should be made carefully within the WTO regulations since it can create a problem from giving preferential tariffs. The government can also simplify the distribution channels in order to reduce the cost in the distribution process. Fifth, the government should educate distributors to raise the efficiency and to modernize the distribution system. It is necessary to develop human resources by educating people regarding the foreign agricultural environment, the produce quality, management skills, and by introducing some successful cases in advanced countries.

  • PDF