• Title/Summary/Keyword: 클래스 불균형

Search Result 89, Processing Time 0.028 seconds

A Study on the Classification of Fault Motors using Sound Data (소리 데이터를 이용한 불량 모터 분류에 관한 연구)

  • Il-Sik, Chang;Gooman, Park
    • Journal of Broadcast Engineering
    • /
    • v.27 no.6
    • /
    • pp.885-896
    • /
    • 2022
  • Motor failure in manufacturing plays an important role in future A/S and reliability. Motor failure is detected by measuring sound, current, and vibration. For the data used in this paper, the sound of the car's side mirror motor gear box was used. Motor sound consists of three classes. Sound data is input to the network model through a conversion process through MelSpectrogram. In this paper, various methods were applied, such as data augmentation to improve the performance of classifying fault motors and various methods according to class imbalance were applied resampling, reweighting adjustment, change of loss function and representation learning and classification into two stages. In addition, the curriculum learning method and self-space learning method were compared through a total of five network models such as Bidirectional LSTM Attention, Convolutional Recurrent Neural Network, Multi-Head Attention, Bidirectional Temporal Convolution Network, and Convolution Neural Network, and the optimal configuration was found for motor sound classification.

Comparison of Classification Performance Between Adult and Elderly Using Acoustic and Linguistic Features from Spontaneous Speech (자유대화의 음향적 특징 및 언어적 특징 기반의 성인과 노인 분류 성능 비교)

  • SeungHoon Han;Byung Ok Kang;Sunghee Dong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.8
    • /
    • pp.365-370
    • /
    • 2023
  • This paper aims to compare the performance of speech data classification into two groups, adult and elderly, based on the acoustic and linguistic characteristics that change due to aging, such as changes in respiratory patterns, phonation, pitch, frequency, and language expression ability. For acoustic features we used attributes related to the frequency, amplitude, and spectrum of speech voices. As for linguistic features, we extracted hidden state vector representations containing contextual information from the transcription of speech utterances using KoBERT, a Korean pre-trained language model that has shown excellent performance in natural language processing tasks. The classification performance of each model trained based on acoustic and linguistic features was evaluated, and the F1 scores of each model for the two classes, adult and elderly, were examined after address the class imbalance problem by down-sampling. The experimental results showed that using linguistic features provided better performance for classifying adult and elderly than using acoustic features, and even when the class proportions were equal, the classification performance for adult was higher than that for elderly.

The Performance Improvement of U-Net Model for Landcover Semantic Segmentation through Data Augmentation (데이터 확장을 통한 토지피복분류 U-Net 모델의 성능 개선)

  • Baek, Won-Kyung;Lee, Moung-Jin;Jung, Hyung-Sup
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_2
    • /
    • pp.1663-1676
    • /
    • 2022
  • Recently, a number of deep-learning based land cover segmentation studies have been introduced. Some studies denoted that the performance of land cover segmentation deteriorated due to insufficient training data. In this study, we verified the improvement of land cover segmentation performance through data augmentation. U-Net was implemented for the segmentation model. And 2020 satellite-derived landcover dataset was utilized for the study data. The pixel accuracies were 0.905 and 0.923 for U-Net trained by original and augmented data respectively. And the mean F1 scores of those models were 0.720 and 0.775 respectively, indicating the better performance of data augmentation. In addition, F1 scores for building, road, paddy field, upland field, forest, and unclassified area class were 0.770, 0.568, 0.433, 0.455, 0.964, and 0.830 for the U-Net trained by original data. It is verified that data augmentation is effective in that the F1 scores of every class were improved to 0.838, 0.660, 0.791, 0.530, 0.969, and 0.860 respectively. Although, we applied data augmentation without considering class balances, we find that data augmentation can mitigate biased segmentation performance caused by data imbalance problems from the comparisons between the performances of two models. It is expected that this study would help to prove the importance and effectiveness of data augmentation in various image processing fields.

Ensemble Size Reduction in Fraud Detection System (축소된 앙상블에 의한 부정행위 적발 모형)

  • Song, Yeong-Mi;Ji, Won-Cheol;Han, Wan-Gyu
    • 한국경영정보학회:학술대회논문집
    • /
    • 2007.06a
    • /
    • pp.597-602
    • /
    • 2007
  • 데이터 마이닝 분야에서 앙상블 모형의 유용성은 널리 인정되고 있다. 앙상블을 구성하는 단위모형들 사이의 다양성이 보장되는 경우, 최종 모형의 정확성 및 안정성이 향상되기 때문이다. 하지만, 얼마나 많은 단위 모형들이 어떤 방식으로 결합되어야 하는가에 대해서는 아직도 더 많은 연구가 필요하다. 본 연구에서는 신용카드 부정사용 유형 중 하나인 현금불법융통 문제에 대해 앙상블 모형의 유용성을 검증하고자 한다. 부정행위 적발 모형은 전형적인 분류 문제의 한 유형이나, 클래스간 불균형이 매우 심하다는 특징이 있다. 따라서, 현금불법융통 문제에 적합한 다양성(Diversity) 척도를 개발하여 최소한의 단위모형들로 앙상블 모형을 구성하는 방안을 제시하였다. 축소된 앙상블 모형이 많은 수의 모형을 결합한 앙상블 모형과 거의 같은 정확성 및 안정성을 보임을 국내 신용카드사의 실제 자료를 사용하여 입증하였다.

  • PDF

Image-based malware classification system using image preprocessing and ensemble techniques (이미지 전처리와 앙상블 기법을 이용한 이미지 기반 악성코드 분류 시스템)

  • Kim, Hae-Soo;Kim, Mi-hui
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.715-718
    • /
    • 2021
  • 정보통신 기술이 발전함에 따라 악의적인 공격을 통해 보안문제를 발생시키고 있다. 또한 새로운 악성코드가 유포되어 기존의 시그니처 비교방식은 새롭게 발생하는 악성코드를 빠르게 분석 할 수 없다. 새로운 악성코드를 빠르게 분석하고 방어기법을 제안하기 위해 악성코드의 패밀리를 분류할 필요가 있다. 본 논문에서는 악성코드의 바이너리 파일을 이용해 시각화하고 CNN모델을 통해 분류한다. 또한 정확도를 높이기 위해 LBP, HOG를 통해 악성코드 이미지에서 중요한 특성을 찾고 데이터 클래스 불균형에서 오는 문제를 앙상블 모델을 통해 해결하는 시스템을 제안한다.

H-PaDiM : Anomaly Segmentation Performance Analysis Based on PaDiM-Based Homogeneous Ensemble Method (H-PaDiM : PaDiM 기반 동종 앙상블 기법에 따른 이상 탐지성능 분석)

  • Kim, InKi;Gwak, Jeonghwan
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2022.07a
    • /
    • pp.95-97
    • /
    • 2022
  • 본 논문에서는 산업 현장에서 발생하는 불량품 탐지 분야에서 효율적으로 생산품의 불량을 탐지할 수 있는 PaDiM 구조의 Backbone 모델을 단일 Wide-ResNet 대신 두 개의 Wide-ResNet을 사용함으로써, 단일 모델에서 추출된 저차원의 Feature를 앙상블을 통해 성능 향상을 일으킬 수 있는 것을 증명하였다. 단일 Wide-ResNet 환경에서는 MVTec 데이터셋에서 생성된 다변량 가우시안 분포가 데이터셋의 적은 샘플수로 인하여 각 클래스 간 불균형이 발생하는 문제를 동종 앙상블을 통해 해결할 수 있었다. 따라서 본 논문에서는 제안하는 동종 모델의 앙상블을 사용함으로써 기존의 One-class classification 환경에서 불량품 탐지환경에서 적은 수의 데이터 샘플 환경에서 성능 향상을 나타낼 수 있음을 입증하였다.

  • PDF

Performance comparison between Decision tree model and TabNet for loan repayment prediction (대출 상환 예측을 위한 의사결정나무모델과 TabNet 간 성능 비교)

  • Sujin Han;Hyeoncheol Kim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.453-455
    • /
    • 2023
  • 본 연구는 은행에서 리스크 관리 자동화를 위해 고객의 대출 상환 여부 예측 모델을 제안하고자 한다. 예측 모델로 금융 데이터 같은 정형데이터에서 전통적으로 높은 성능을 보인 의사결정나무기반 모델 LightGBM, CatBoost, XGB 와 최근 제안된 정형데이터에서 사용할 수 있는 설명 가능한 딥러닝 기반 모델 TabNet 간의 성능 비교를 진행한다. 다만, 대출 상환 여부 데이터는 불균형 클래스 데이터로 구성되어있어 샘플링을 진행한다. SMOTE, Random Under Sampling, 혼합 방식을 비교해 가장 높은 성능의 샘플링 기법을 제안한다. 대출 상환 여부 예측 결과 TabNet 모델이 의사결정나무모델들보다 좋은 성능을 보여 정형데이터에서 의사결정나무 기반 모델을 딥러닝 모델이 대체 할 수 있는 가능성을 확인했다.

Generation of High-Resolution Chest X-rays using Multi-scale Conditional Generative Adversarial Network with Attention (주목 메커니즘 기반의 멀티 스케일 조건부 적대적 생성 신경망을 활용한 고해상도 흉부 X선 영상 생성 기법)

  • Ann, Kyeongjin;Jang, Yeonggul;Ha, Seongmin;Jeon, Byunghwan;Hong, Youngtaek;Shim, Hackjoon;Chang, Hyuk-Jae
    • Journal of Broadcast Engineering
    • /
    • v.25 no.1
    • /
    • pp.1-12
    • /
    • 2020
  • In the medical field, numerical imbalance of data due to differences in disease prevalence is a common problem. It reduces the performance of a artificial intelligence network, leading to difficulties in learning a network with good performance. Recently, generative adversarial network (GAN) technology has been introduced as a way to address this problem, and its ability has been demonstrated by successful applications in various fields. However, it is still difficult to achieve good results in solving problems with performance degraded by numerical imbalances because the image resolution of the previous studies is not yet good enough and the structure in the image is modeled locally. In this paper, we propose a multi-scale conditional generative adversarial network based on attention mechanism, which can produce high resolution images to solve the numerical imbalance problem of chest X-ray image data. The network was able to produce images for various diseases by controlling condition variables with only one network. It's efficient and effective in that the network don't need to be learned independently for all disease classes and solves the problem of long distance dependency in image generation with self-attention mechanism.

Effective Harmony Search-Based Optimization of Cost-Sensitive Boosting for Improving the Performance of Cross-Project Defect Prediction (교차 프로젝트 결함 예측 성능 향상을 위한 효과적인 하모니 검색 기반 비용 민감 부스팅 최적화)

  • Ryu, Duksan;Baik, Jongmoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.3
    • /
    • pp.77-90
    • /
    • 2018
  • Software Defect Prediction (SDP) is a field of study that identifies defective modules. With insufficient local data, a company can exploit Cross-Project Defect Prediction (CPDP), a way to build a classifier using dataset collected from other companies. Most machine learning algorithms for SDP have used more than one parameter that significantly affects prediction performance depending on different values. The objective of this study is to propose a parameter selection technique to enhance the performance of CPDP. Using a Harmony Search algorithm (HS), our approach tunes parameters of cost-sensitive boosting, a method to tackle class imbalance causing the difficulty of prediction. According to distributional characteristics, parameter ranges and constraint rules between parameters are defined and applied to HS. The proposed approach is compared with three CPDP methods and a Within-Project Defect Prediction (WPDP) method over fifteen target projects. The experimental results indicate that the proposed model outperforms the other CPDP methods in the context of class imbalance. Unlike the previous researches showing high probability of false alarm or low probability of detection, our approach provides acceptable high PD and low PF while providing high overall performance. It also provides similar performance compared with WPDP.

Prediction of Protein-Protein Interaction Sites Based on 3D Surface Patches Using SVM (SVM 모델을 이용한 3차원 패치 기반 단백질 상호작용 사이트 예측기법)

  • Park, Sung-Hee;Hansen, Bjorn
    • The KIPS Transactions:PartD
    • /
    • v.19D no.1
    • /
    • pp.21-28
    • /
    • 2012
  • Predication of protein interaction sites for monomer structures can reduce the search space for protein docking and has been regarded as very significant for predicting unknown functions of proteins from their interacting proteins whose functions are known. In the other hand, the prediction of interaction sites has been limited in crystallizing weakly interacting complexes which are transient and do not form the complexes stable enough for obtaining experimental structures by crystallization or even NMR for the most important protein-protein interactions. This work reports the calculation of 3D surface patches of complex structures and their properties and a machine learning approach to build a predictive model for the 3D surface patches in interaction and non-interaction sites using support vector machine. To overcome classification problems for class imbalanced data, we employed an under-sampling technique. 9 properties of the patches were calculated from amino acid compositions and secondary structure elements. With 10 fold cross validation, the predictive model built from SVM achieved an accuracy of 92.7% for classification of 3D patches in interaction and non-interaction sites from 147 complexes.