• Title/Summary/Keyword: Imbalance Problem

Search Result 271, Processing Time 0.021 seconds

유전자 알고리즘을 활용한 데이터 불균형 해소 기법의 조합적 활용

  • Jang, Yeong-Sik;Kim, Jong-U;Heo, Jun
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2007.05a
    • /
    • pp.309-320
    • /
    • 2007
  • The data imbalance problem which can be uncounted in data mining classification problems typically means that there are more or less instances in a class than those in other classes. It causes low prediction accuracy of the minority class because classifiers tend to assign instances to major classes and ignore the minor class to reduce overall misclassification rate. In order to solve the data imbalance problem, there has been proposed a number of techniques based on resampling with replacement, adjusting decision thresholds, and adjusting the cost of the different classes. In this paper, we study the feasibility of the combination usage of the techniques previously proposed to deal with the data imbalance problem, and suggest a combination method using genetic algorithm to find the optimal combination ratio of the techniques. To improve the prediction accuracy of a minority class, we determine the combination ratio based on the F-value of the minority class as the fitness function of genetic algorithm. To compare the performance with those of single techniques and the matrix-style combination of random percentage, we performed experiments using four public datasets which has been generally used to compare the performance of methods for the data imbalance problem. From the results of experiments, we can find the usefulness of the proposed method.

  • PDF

Experimental Evaluation of Q-Parameterization Control for the Imbalance Compensation of Magnetic Bearing Syatem (Q-매개변수화 제어를 이용한 자기축수 시스템의 불평형 보상에 대한 실험적평가)

  • Lee, Jun-Ho;Kim, Hyeon-Gi;Lee, Jeong-Seok;Lee, Gi-Seo
    • The Transactions of the Korean Institute of Electrical Engineers A
    • /
    • v.48 no.3
    • /
    • pp.278-285
    • /
    • 1999
  • This paper utilizes the method of Q-parameterization control to design a controller which solves the problem of imbalance in magnetic bearing systems. There are two methods to solve this problem using feedback controal. The first method is to compensate for the imbalance forces by generating opposing forces on the bearing surface (imbalance compensation). The second method is to make the rotor rotate around its axis of inertia (automatic balancing);in this case no imbalance forces will be generated. In this paper we deal with only imbalance compensation. The free parameter of the Q-parameterization controller is chosen such that these goals are achieved. After the introduction of a model of the magnetic bearing system, we explain the Q-parameterization controller design of the magnetic bearing system with emphasis on the rejection of sinusoidal disturbance for imbalance compensation design. The design objectives are formulated as a linear equations in the controller free paramete Q. Finally, simulation and experimental results are presented and showed the robustness and effectiveness of the proposed controllers.

  • PDF

Verification of gate balancing equation using injection molding analysis (사출성형해석 연구를 이용한 게이트 밸런스 계산식의 검증)

  • Han, Seong-Ryeol
    • Design & Manufacturing
    • /
    • v.12 no.3
    • /
    • pp.55-59
    • /
    • 2018
  • In a multi-cavity mold having a runner layout of a fish bone structure, problems of unbalanced filling between cavities occur constantly. Unbalanced charging lowers the dimensional accuracy of a molded article and causes deformation after molding. To solve this problem, the gate size connected to each cavity is adjusted using the BGV (Balanced Gate Value) equation. In this paper, in order to solve the filling imbalance problem of the runner layout mold of fish bone structure through injection molding analysis study, we compared the charging imbalance phenomenon before and after improvement after adjusting the gate size by applying BGV equation. From the results of the molding analysis, the shrinkage ratio before and after the improvement of the molded article was improved by only about 0.08%. Based on these results, it was confirmed that the charging imbalance problem was not significantly improved even when the BGV equation was applied.

Combined Application of Data Imbalance Reduction Techniques Using Genetic Algorithm (유전자 알고리즘을 활용한 데이터 불균형 해소 기법의 조합적 활용)

  • Jang, Young-Sik;Kim, Jong-Woo;Hur, Joon
    • Journal of Intelligence and Information Systems
    • /
    • v.14 no.3
    • /
    • pp.133-154
    • /
    • 2008
  • The data imbalance problem which can be uncounted in data mining classification problems typically means that there are more or less instances in a class than those in other classes. In order to solve the data imbalance problem, there has been proposed a number of techniques based on re-sampling with replacement, adjusting decision thresholds, and adjusting the cost of the different classes. In this paper, we study the feasibility of the combination usage of the techniques previously proposed to deal with the data imbalance problem, and suggest a combination method using genetic algorithm to find the optimal combination ratio of the techniques. To improve the prediction accuracy of a minority class, we determine the combination ratio based on the F-value of the minority class as the fitness function of genetic algorithm. To compare the performance with those of single techniques and the matrix-style combination of random percentage, we performed experiments using four public datasets which has been generally used to compare the performance of methods for the data imbalance problem. From the results of experiments, we can find the usefulness of the proposed method.

  • PDF

A Methodology for Bankruptcy Prediction in Imbalanced Datasets using eXplainable AI (데이터 불균형을 고려한 설명 가능한 인공지능 기반 기업부도예측 방법론 연구)

  • Heo, Sun-Woo;Baek, Dong Hyun
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.45 no.2
    • /
    • pp.65-76
    • /
    • 2022
  • Recently, not only traditional statistical techniques but also machine learning algorithms have been used to make more accurate bankruptcy predictions. But the insolvency rate of companies dealing with financial institutions is very low, resulting in a data imbalance problem. In particular, since data imbalance negatively affects the performance of artificial intelligence models, it is necessary to first perform the data imbalance process. In additional, as artificial intelligence algorithms are advanced for precise decision-making, regulatory pressure related to securing transparency of Artificial Intelligence models is gradually increasing, such as mandating the installation of explanation functions for Artificial Intelligence models. Therefore, this study aims to present guidelines for eXplainable Artificial Intelligence-based corporate bankruptcy prediction methodology applying SMOTE techniques and LIME algorithms to solve a data imbalance problem and model transparency problem in predicting corporate bankruptcy. The implications of this study are as follows. First, it was confirmed that SMOTE can effectively solve the data imbalance issue, a problem that can be easily overlooked in predicting corporate bankruptcy. Second, through the LIME algorithm, the basis for predicting bankruptcy of the machine learning model was visualized, and derive improvement priorities of financial variables that increase the possibility of bankruptcy of companies. Third, the scope of application of the algorithm in future research was expanded by confirming the possibility of using SMOTE and LIME through case application.

Learning Behavior Analysis of Bayesian Algorithm Under Class Imbalance Problems (클래스 불균형 문제에서 베이지안 알고리즘의 학습 행위 분석)

  • Hwang, Doo-Sung
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.45 no.6
    • /
    • pp.179-186
    • /
    • 2008
  • In this paper we analyse the effects of Bayesian algorithm in teaming class imbalance problems and compare the performance evaluation methods. The teaming performance of the Bayesian algorithm is evaluated over the class imbalance problems generated by priori data distribution, imbalance data rate and discrimination complexity. The experimental results are calculated by the AUC(Area Under the Curve) values of both ROC(Receiver Operator Characteristic) and PR(Precision-Recall) evaluation measures and compared according to imbalance data rate and discrimination complexity. In comparison and analysis, the Bayesian algorithm suffers from the imbalance rate, as the same result in the reported researches, and the data overlapping caused by discrimination complexity is the another factor that hampers the learning performance. As the discrimination complexity and class imbalance rate of the problems increase, the learning performance of the AUC of a PR measure is much more variant than that of the AUC of a ROC measure. But the performances of both measures are similar with the low discrimination complexity and class imbalance rate of the problems. The experimental results show 4hat the AUC of a PR measure is more proper in evaluating the learning of class imbalance problem and furthermore gets the benefit in designing the optimal learning model considering a misclassification cost.

The Optimization of Ensembles for Bankruptcy Prediction (기업부도 예측 앙상블 모형의 최적화)

  • Myoung Jong Kim;Woo Seob Yun
    • Information Systems Review
    • /
    • v.24 no.1
    • /
    • pp.39-57
    • /
    • 2022
  • This paper proposes the GMOPTBoost algorithm to improve the performance of the AdaBoost algorithm for bankruptcy prediction in which class imbalance problem is inherent. AdaBoost algorithm has the advantage of providing a robust learning opportunity for misclassified samples. However, there is a limitation in addressing class imbalance problem because the concept of arithmetic mean accuracy is embedded in AdaBoost algorithm. GMOPTBoost can optimize the geometric mean accuracy and effectively solve the category imbalance problem by applying Gaussian gradient descent. The samples are constructed according to the following two phases. First, five class imbalance datasets are constructed to verify the effect of the class imbalance problem on the performance of the prediction model and the performance improvement effect of GMOPTBoost. Second, class balanced data are constituted through data sampling techniques to verify the performance improvement effect of GMOPTBoost. The main results of 30 times of cross-validation analyzes are as follows. First, the class imbalance problem degrades the performance of ensembles. Second, GMOPTBoost contributes to performance improvements of AdaBoost ensembles trained on imbalanced datasets. Third, Data sampling techniques have a positive impact on performance improvement. Finally, GMOPTBoost contributes to significant performance improvement of AdaBoost ensembles trained on balanced datasets.

A New Required Reserve Capacity Determining Scheme with Regard to Real time Load Imbalance

  • Park, Joon Hyung;Kim, Sun Kyo;Yoon, Yong Tae
    • Journal of Electrical Engineering and Technology
    • /
    • v.10 no.2
    • /
    • pp.511-517
    • /
    • 2015
  • Determination of the required reserve capacity has an important function in operation of power system and it is calculated based on the largest loss of supply. However, conventional method cannot be applied in future power system, because potential grid-connected distributed generator and abnormal temperature cause the large load imbalance. Therefore this paper address new framework for determining the optimal required reserve capacity taking into account the real time load imbalance. At first, we introduce the way of operating reserve resources which are the secondary, tertiary, Direct Load Control (DLC) and Load shedding reserves to make up the load imbalance. Then, the formulated problem can be solved by the Probabilistic Dynamic Programming (PDP) method. In case study, we divide two cases for comparing the cost function between the conventional method and the proposed method.

Development of Evaluation Metrics that Consider Data Imbalance between Classes in Facies Classification (지도학습 기반 암상 분류 시 클래스 간 자료 불균형을 고려한 평가지표 개발)

  • Kim, Dowan;Choi, Junhwan;Byun, Joongmoo
    • Geophysics and Geophysical Exploration
    • /
    • v.23 no.3
    • /
    • pp.131-140
    • /
    • 2020
  • In training a classification model using machine learning, the acquisition of training data is a very important stage, because the amount and quality of the training data greatly influence the model performance. However, when the cost of obtaining data is so high that it is difficult to build ideal training data, the number of samples for each class may be acquired very differently, and a serious data-imbalance problem can occur. If such a problem occurs in the training data, all classes are not trained equally, and classes containing relatively few data will have significantly lower recall values. Additionally, the reliability of evaluation indices such as accuracy and precision will be reduced. Therefore, this study sought to overcome the problem of data imbalance in two stages. First, we introduced weighted accuracy and weighted precision as new evaluation indices that can take into account a data-imbalance ratio by modifying conventional measures of accuracy and precision. Next, oversampling was performed to balance weighted precision and recall among classes. We verified the algorithm by applying it to the problem of facies classification. As a result, the imbalance between majority and minority classes was greatly mitigated, and the boundaries between classes could be more clearly identified.

Credit Card Bad Debt Prediction Model based on Support Vector Machine (신용카드 대손회원 예측을 위한 SVM 모형)

  • Kim, Jin Woo;Jhee, Won Chul
    • Journal of Information Technology Services
    • /
    • v.11 no.4
    • /
    • pp.233-250
    • /
    • 2012
  • In this paper, credit card delinquency means the possibility of occurring bad debt within the certain near future from the normal accounts that have no debt and the problem is to predict, on the monthly basis, the occurrence of delinquency 3 months in advance. This prediction is typical binary classification problem but suffers from the issue of data imbalance that means the instances of target class is very few. For the effective prediction of bad debt occurrence, Support Vector Machine (SVM) with kernel trick is adopted using credit card usage and payment patterns as its inputs. SVM is widely accepted in the data mining society because of its prediction accuracy and no fear of overfitting. However, it is known that SVM has the limitation in its ability to processing the large-scale data. To resolve the difficulties in applying SVM to bad debt occurrence prediction, two stage clustering is suggested as an effective data reduction method and ensembles of SVM models are also adopted to mitigate the difficulty due to data imbalance intrinsic to the target problem of this paper. In the experiments with the real world data from one of the major domestic credit card companies, the suggested approach reveals the superior prediction accuracy to the traditional data mining approaches that use neural networks, decision trees or logistics regressions. SVM ensemble model learned from T2 training set shows the best prediction results among the alternatives considered and it is noteworthy that the performance of neural networks with T2 is better than that of SVM with T1. These results prove that the suggested approach is very effective for both SVM training and the classification problem of data imbalance.