• Title/Summary/Keyword: class imbalance

Search Result 126, Processing Time 0.022 seconds

Learning Behavior Analysis of Bayesian Algorithm Under Class Imbalance Problems (클래스 불균형 문제에서 베이지안 알고리즘의 학습 행위 분석)

  • Hwang, Doo-Sung
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.45 no.6
    • /
    • pp.179-186
    • /
    • 2008
  • In this paper we analyse the effects of Bayesian algorithm in teaming class imbalance problems and compare the performance evaluation methods. The teaming performance of the Bayesian algorithm is evaluated over the class imbalance problems generated by priori data distribution, imbalance data rate and discrimination complexity. The experimental results are calculated by the AUC(Area Under the Curve) values of both ROC(Receiver Operator Characteristic) and PR(Precision-Recall) evaluation measures and compared according to imbalance data rate and discrimination complexity. In comparison and analysis, the Bayesian algorithm suffers from the imbalance rate, as the same result in the reported researches, and the data overlapping caused by discrimination complexity is the another factor that hampers the learning performance. As the discrimination complexity and class imbalance rate of the problems increase, the learning performance of the AUC of a PR measure is much more variant than that of the AUC of a ROC measure. But the performances of both measures are similar with the low discrimination complexity and class imbalance rate of the problems. The experimental results show 4hat the AUC of a PR measure is more proper in evaluating the learning of class imbalance problem and furthermore gets the benefit in designing the optimal learning model considering a misclassification cost.

A Study on Visual Emotion Classification using Balanced Data Augmentation (균형 잡힌 데이터 증강 기반 영상 감정 분류에 관한 연구)

  • Jeong, Chi Yoon;Kim, Mooseop
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.7
    • /
    • pp.880-889
    • /
    • 2021
  • In everyday life, recognizing people's emotions from their frames is essential and is a popular research domain in the area of computer vision. Visual emotion has a severe class imbalance in which most of the data are distributed in specific categories. The existing methods do not consider class imbalance and used accuracy as the performance metric, which is not suitable for evaluating the performance of the imbalanced dataset. Therefore, we proposed a method for recognizing visual emotion using balanced data augmentation to address the class imbalance. The proposed method generates a balanced dataset by adopting the random over-sampling and image transformation methods. Also, the proposed method uses the Focal loss as a loss function, which can mitigate the class imbalance by down weighting the well-classified samples. EfficientNet, which is the state-of-the-art method for image classification is used to recognize visual emotion. We compare the performance of the proposed method with that of conventional methods by using a public dataset. The experimental results show that the proposed method increases the F1 score by 40% compared with the method without data augmentation, mitigating class imbalance without loss of classification accuracy.

유전자 알고리즘을 활용한 데이터 불균형 해소 기법의 조합적 활용

  • Jang, Yeong-Sik;Kim, Jong-U;Heo, Jun
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2007.05a
    • /
    • pp.309-320
    • /
    • 2007
  • The data imbalance problem which can be uncounted in data mining classification problems typically means that there are more or less instances in a class than those in other classes. It causes low prediction accuracy of the minority class because classifiers tend to assign instances to major classes and ignore the minor class to reduce overall misclassification rate. In order to solve the data imbalance problem, there has been proposed a number of techniques based on resampling with replacement, adjusting decision thresholds, and adjusting the cost of the different classes. In this paper, we study the feasibility of the combination usage of the techniques previously proposed to deal with the data imbalance problem, and suggest a combination method using genetic algorithm to find the optimal combination ratio of the techniques. To improve the prediction accuracy of a minority class, we determine the combination ratio based on the F-value of the minority class as the fitness function of genetic algorithm. To compare the performance with those of single techniques and the matrix-style combination of random percentage, we performed experiments using four public datasets which has been generally used to compare the performance of methods for the data imbalance problem. From the results of experiments, we can find the usefulness of the proposed method.

  • PDF

Severity-based Software Quality Prediction using Class Imbalanced Data

  • Hong, Euy-Seok;Park, Mi-Kyeong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.4
    • /
    • pp.73-80
    • /
    • 2016
  • Most fault prediction models have class imbalance problems because training data usually contains much more non-fault class modules than fault class ones. This imbalanced distribution makes it difficult for the models to learn the minor class module data. Data imbalance is much higher when severity-based fault prediction is used. This is because high severity fault modules is a smaller subset of the fault modules. In this paper, we propose severity-based models to solve these problems using the three sampling methods, Resample, SpreadSubSample and SMOTE. Empirical results show that Resample method has typical over-fit problems, and SpreadSubSample method cannot enhance the prediction performance of the models. Unlike two methods, SMOTE method shows good performance in terms of AUC and FNR values. Especially J48 decision tree model using SMOTE outperforms other prediction models.

The Optimization of Ensembles for Bankruptcy Prediction (기업부도 예측 앙상블 모형의 최적화)

  • Myoung Jong Kim;Woo Seob Yun
    • Information Systems Review
    • /
    • v.24 no.1
    • /
    • pp.39-57
    • /
    • 2022
  • This paper proposes the GMOPTBoost algorithm to improve the performance of the AdaBoost algorithm for bankruptcy prediction in which class imbalance problem is inherent. AdaBoost algorithm has the advantage of providing a robust learning opportunity for misclassified samples. However, there is a limitation in addressing class imbalance problem because the concept of arithmetic mean accuracy is embedded in AdaBoost algorithm. GMOPTBoost can optimize the geometric mean accuracy and effectively solve the category imbalance problem by applying Gaussian gradient descent. The samples are constructed according to the following two phases. First, five class imbalance datasets are constructed to verify the effect of the class imbalance problem on the performance of the prediction model and the performance improvement effect of GMOPTBoost. Second, class balanced data are constituted through data sampling techniques to verify the performance improvement effect of GMOPTBoost. The main results of 30 times of cross-validation analyzes are as follows. First, the class imbalance problem degrades the performance of ensembles. Second, GMOPTBoost contributes to performance improvements of AdaBoost ensembles trained on imbalanced datasets. Third, Data sampling techniques have a positive impact on performance improvement. Finally, GMOPTBoost contributes to significant performance improvement of AdaBoost ensembles trained on balanced datasets.

Centroid and Nearest Neighbor based Class Imbalance Reduction with Relevant Feature Selection using Ant Colony Optimization for Software Defect Prediction

  • B., Kiran Kumar;Gyani, Jayadev;Y., Bhavani;P., Ganesh Reddy;T, Nagasai Anjani Kumar
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.10
    • /
    • pp.1-10
    • /
    • 2022
  • Nowadays software defect prediction (SDP) is most active research going on in software engineering. Early detection of defects lowers the cost of the software and also improves reliability. Machine learning techniques are widely used to create SDP models based on programming measures. The majority of defect prediction models in the literature have problems with class imbalance and high dimensionality. In this paper, we proposed Centroid and Nearest Neighbor based Class Imbalance Reduction (CNNCIR) technique that considers dataset distribution characteristics to generate symmetry between defective and non-defective records in imbalanced datasets. The proposed approach is compared with SMOTE (Synthetic Minority Oversampling Technique). The high-dimensionality problem is addressed using Ant Colony Optimization (ACO) technique by choosing relevant features. We used nine different classifiers to analyze six open-source software defect datasets from the PROMISE repository and seven performance measures are used to evaluate them. The results of the proposed CNNCIR method with ACO based feature selection reveals that it outperforms SMOTE in the majority of cases.

Convolutional neural network-based data anomaly detection considering class imbalance with limited data

  • Du, Yao;Li, Ling-fang;Hou, Rong-rong;Wang, Xiao-you;Tian, Wei;Xia, Yong
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.63-75
    • /
    • 2022
  • The raw data collected by structural health monitoring (SHM) systems may suffer multiple patterns of anomalies, which pose a significant barrier for an automatic and accurate structural condition assessment. Therefore, the detection and classification of these anomalies is an essential pre-processing step for SHM systems. However, the heterogeneous data patterns, scarce anomalous samples and severe class imbalance make data anomaly detection difficult. In this regard, this study proposes a convolutional neural network-based data anomaly detection method. The time and frequency domains data are transferred as images and used as the input of the neural network for training. ResNet18 is adopted as the feature extractor to avoid training with massive labelled data. In addition, the focal loss function is adopted to soften the class imbalance-induced classification bias. The effectiveness of the proposed method is validated using acceleration data collected in a long-span cable-stayed bridge. The proposed approach detects and classifies data anomalies with high accuracy.

Network Intrusion Detection Using Transformer and BiGRU-DNN in Edge Computing

  • Huijuan Sun
    • Journal of Information Processing Systems
    • /
    • v.20 no.4
    • /
    • pp.458-476
    • /
    • 2024
  • To address the issue of class imbalance in network traffic data, which affects the network intrusion detection performance, a combined framework using transformers is proposed. First, Tomek Links, SMOTE, and WGAN are used to preprocess the data to solve the class-imbalance problem. Second, the transformer is used to encode traffic data to extract the correlation between network traffic. Finally, a hybrid deep learning network model combining a bidirectional gated current unit and deep neural network is proposed, which is used to extract long-dependence features. A DNN is used to extract deep level features, and softmax is used to complete classification. Experiments were conducted on the NSLKDD, UNSWNB15, and CICIDS2017 datasets, and the detection accuracy rates of the proposed model were 99.72%, 84.86%, and 99.89% on three datasets, respectively. Compared with other relatively new deep-learning network models, it effectively improved the intrusion detection performance, thereby improving the communication security of network data.

Cross-Project Pooling of Defects for Handling Class Imbalance

  • Catherine, J.M.;Djodilatchoumy, S
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.10
    • /
    • pp.11-16
    • /
    • 2022
  • Applying predictive analytics to predict software defects has improved the overall quality and decreased maintenance costs. Many supervised and unsupervised learning algorithms have been used for defect prediction on publicly available datasets. Most of these datasets suffer from an imbalance in the output classes. We study the impact of class imbalance in the defect datasets on the efficiency of the defect prediction model and propose a CPP method for handling imbalances in the dataset. The performance of the methods is evaluated using measures like Matthew's Correlation Coefficient (MCC), Recall, and Accuracy measures. The proposed sampling technique shows significant improvement in the efficiency of the classifier in predicting defects.

Comparison of Loss Function for Multi-Class Classification of Collision Events in Imbalanced Black-Box Video Data (불균형 블랙박스 동영상 데이터에서 충돌 상황의 다중 분류를 위한 손실 함수 비교)

  • Euisang Lee;Seokmin Han
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.24 no.1
    • /
    • pp.49-54
    • /
    • 2024
  • Data imbalance is a common issue encountered in classification problems, stemming from a significant disparity in the number of samples between classes within the dataset. Such data imbalance typically leads to problems in classification models, including overfitting, underfitting, and misinterpretation of performance metrics. Methods to address this issue include resampling, augmentation, regularization techniques, and adjustment of loss functions. In this paper, we focus on loss function adjustment, particularly comparing the performance of various configurations of loss functions (Cross Entropy, Balanced Cross Entropy, two settings of Focal Loss: 𝛼 = 1 and 𝛼 = Balanced, Asymmetric Loss) on Multi-Class black-box video data with imbalance issues. The comparison is conducted using the I3D, and R3D_18 models.