• Title/Summary/Keyword: Class Imbalance Problem

Search Result 52, Processing Time 0.032 seconds

A Study on the Improvement of Image Classification Performance in the Defense Field through Cost-Sensitive Learning of Imbalanced Data (불균형데이터의 비용민감학습을 통한 국방분야 이미지 분류 성능 향상에 관한 연구)

  • Jeong, Miae;Ma, Jungmok
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.24 no.3
    • /
    • pp.281-292
    • /
    • 2021
  • With the development of deep learning technology, researchers and technicians keep attempting to apply deep learning in various industrial and academic fields, including the defense. Most of these attempts assume that the data are balanced. In reality, since lots of the data are imbalanced, the classifier is not properly built and the model's performance can be low. Therefore, this study proposes cost-sensitive learning as a solution to the imbalance data problem of image classification in the defense field. In the proposed model, cost-sensitive learning is a method of giving a high weight on the cost function of a minority class. The results of cost-sensitive based model shows the test F1-score is higher when cost-sensitive learning is applied than general learning's through 160 experiments using submarine/non-submarine dataset and warship/non-warship dataset. Furthermore, statistical tests are conducted and the results are shown significantly.

Intrusion Detection Approach using Feature Learning and Hierarchical Classification (특징학습과 계층분류를 이용한 침입탐지 방법 연구)

  • Han-Sung Lee;Yun-Hee Jeong;Se-Hoon Jung
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.1
    • /
    • pp.249-256
    • /
    • 2024
  • Machine learning-based intrusion detection methodologies require a large amount of uniform learning data for each class to be classified, and have the problem of having to retrain the entire system when adding an attack type to be detected or classified. In this paper, we use feature learning and hierarchical classification methods to solve classification problems and data imbalance problems using relatively little training data, and propose an intrusion detection methodology that makes it easy to add new attack types. The feasibility of the proposed system was verified through experiments using KDD IDS data..

Comparison of Semantic Segmentation Performance of U-Net according to the Ratio of Small Objects for Nuclear Activity Monitoring (핵활동 모니터링을 위한 소형객체 비율에 따른 U-Net의 의미론적 분할 성능 비교)

  • Lee, Jinmin;Kim, Taeheon;Lee, Changhui;Lee, Hyunjin;Song, Ahram;Han, Youkyung
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_4
    • /
    • pp.1925-1934
    • /
    • 2022
  • Monitoring nuclear activity for inaccessible areas using remote sensing technology is essential for nuclear non-proliferation. In recent years, deep learning has been actively used to detect nuclear-activity-related small objects. However, high-resolution satellite imagery containing small objects can result in class imbalance. As a result, there is a performance degradation problem in detecting small objects. Therefore, this study aims to improve detection accuracy by analyzing the effect of the ratio of small objects related to nuclear activity in the input data for the performance of the deep learning model. To this end, six case datasets with different ratios of small object pixels were generated and a U-Net model was trained for each case. Following that, each trained model was evaluated quantitatively and qualitatively using a test dataset containing various types of small object classes. The results of this study confirm that when the ratio of object pixels in the input image is adjusted, small objects related to nuclear activity can be detected efficiently. This study suggests that the performance of deep learning can be improved by adjusting the object pixel ratio of input data in the training dataset.

A Hybrid Oversampling Technique for Imbalanced Structured Data based on SMOTE and Adapted CycleGAN (불균형 정형 데이터를 위한 SMOTE와 변형 CycleGAN 기반 하이브리드 오버샘플링 기법)

  • Jung-Dam Noh;Byounggu Choi
    • Information Systems Review
    • /
    • v.24 no.4
    • /
    • pp.97-118
    • /
    • 2022
  • As generative adversarial network (GAN) based oversampling techniques have achieved impressive results in class imbalance of unstructured dataset such as image, many studies have begun to apply it to solving the problem of imbalance in structured dataset. However, these studies have failed to reflect the characteristics of structured data due to changing the data structure into an unstructured data format. In order to overcome the limitation, this study adapted CycleGAN to reflect the characteristics of structured data, and proposed hybridization of synthetic minority oversampling technique (SMOTE) and the adapted CycleGAN. In particular, this study tried to overcome the limitations of existing studies by using a one-dimensional convolutional neural network unlike previous studies that used two-dimensional convolutional neural network. Oversampling based on the method proposed have been experimented using various datasets and compared the performance of the method with existing oversampling methods such as SMOTE and adaptive synthetic sampling (ADASYN). The results indicated the proposed hybrid oversampling method showed superior performance compared to the existing methods when data have more dimensions or higher degree of imbalance. This study implied that the classification performance of oversampling structured data can be improved using the proposed hybrid oversampling method that considers the characteristic of structured data.

Comparison of Classification Performance Between Adult and Elderly Using Acoustic and Linguistic Features from Spontaneous Speech (자유대화의 음향적 특징 및 언어적 특징 기반의 성인과 노인 분류 성능 비교)

  • SeungHoon Han;Byung Ok Kang;Sunghee Dong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.8
    • /
    • pp.365-370
    • /
    • 2023
  • This paper aims to compare the performance of speech data classification into two groups, adult and elderly, based on the acoustic and linguistic characteristics that change due to aging, such as changes in respiratory patterns, phonation, pitch, frequency, and language expression ability. For acoustic features we used attributes related to the frequency, amplitude, and spectrum of speech voices. As for linguistic features, we extracted hidden state vector representations containing contextual information from the transcription of speech utterances using KoBERT, a Korean pre-trained language model that has shown excellent performance in natural language processing tasks. The classification performance of each model trained based on acoustic and linguistic features was evaluated, and the F1 scores of each model for the two classes, adult and elderly, were examined after address the class imbalance problem by down-sampling. The experimental results showed that using linguistic features provided better performance for classifying adult and elderly than using acoustic features, and even when the class proportions were equal, the classification performance for adult was higher than that for elderly.

Problems and Development of Police Officials' Physical Fitness Tests (경찰공무원 체력검정의 문제점 및 발전방안)

  • Kim, Sang-Woon
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.8
    • /
    • pp.609-619
    • /
    • 2019
  • The study aims to present a solution to the problem of police physical fitness tests as police officers, who tried to subdue the drunk in a video clip titled "Darim-dong female police officer Assault" in May in Seoul, showed a rather lethargic figure, such as being pushed out of a physical fight with the suspect. The police physical fitness test is subject to criticism as it consists of items that are difficult to apply in real life despite having to be linked to job performance. The problem is that the physical fitness test events are not realistic, the physical fitness test standards are set too low, and their credibility is not reliable due to the imbalance of standards between men and women and the vision culture of physical fitness testing methods. First of all, we hope that the Republic of Korea will become a world-class security powerhouse by upgrading its physical fitness standards and establishing a scientific fitness test system for preventing injuries and effectively measuring physical strength.

Automatic Augmentation Technique of an Autoencoder-based Numerical Training Data (오토인코더 기반 수치형 학습데이터의 자동 증강 기법)

  • Jeong, Ju-Eun;Kim, Han-Joon;Chun, Jong-Hoon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.5
    • /
    • pp.75-86
    • /
    • 2022
  • This study aims to solve the problem of class imbalance in numerical data by using a deep learning-based Variational AutoEncoder and to improve the performance of the learning model by augmenting the learning data. We propose 'D-VAE' to artificially increase the number of records for a given table data. The main features of the proposed technique go through discretization and feature selection in the preprocessing process to optimize the data. In the discretization process, K-means are applied and grouped, and then converted into one-hot vectors by one-hot encoding technique. Subsequently, for memory efficiency, sample data are generated with Variational AutoEncoder using only features that help predict with RFECV among feature selection techniques. To verify the performance of the proposed model, we demonstrate its validity by conducting experiments by data augmentation ratio.

Comparison of Anomaly Detection Performance Based on GRU Model Applying Various Data Preprocessing Techniques and Data Oversampling (다양한 데이터 전처리 기법과 데이터 오버샘플링을 적용한 GRU 모델 기반 이상 탐지 성능 비교)

  • Yoo, Seung-Tae;Kim, Kangseok
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.2
    • /
    • pp.201-211
    • /
    • 2022
  • According to the recent change in the cybersecurity paradigm, research on anomaly detection methods using machine learning and deep learning techniques, which are AI implementation technologies, is increasing. In this study, a comparative study on data preprocessing techniques that can improve the anomaly detection performance of a GRU (Gated Recurrent Unit) neural network-based intrusion detection model using NGIDS-DS (Next Generation IDS Dataset), an open dataset, was conducted. In addition, in order to solve the class imbalance problem according to the ratio of normal data and attack data, the detection performance according to the oversampling ratio was compared and analyzed using the oversampling technique applied with DCGAN (Deep Convolutional Generative Adversarial Networks). As a result of the experiment, the method preprocessed using the Doc2Vec algorithm for system call feature and process execution path feature showed good performance, and in the case of oversampling performance, when DCGAN was used, improved detection performance was shown.

Effect of Urban Planning on Spatial Equity - An Analysis on the Accessibility Change to Urban Cultural Facilities by Income Class Factor in the Daejeon Metropolitan City Using GIS - (도시계획사업이 공간적 형평성에 미치는 효과 - GIS를 이용한 대전광역시 도로건설사업의 소득계층간 접근성 변화 분석 -)

  • Leem, Youn-Taik;Seo, Chang-Woo;Lee, Sang-Ho
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.15 no.2
    • /
    • pp.23-34
    • /
    • 2012
  • As the quality of life grows, the role of cultural facilities in urban areas is becoming more important. However, due to various reasons, the location of these facilities shows the geographical imbalance between urban regions. Even though provision of road network can improve this kind of urban problem, in many countries, the provision of urban infrastructure plays a role which is magnifies the cultural gap between regions and socio-economic classes. The findings of this study are as follows. First of all, the inequality of accessibility to cultural facilities is shown over the period. Cross-sectional data shows that the higher the income of a region, the higher the accessibility index(AI) of the zone to cultural facilities at any time. Next, the provision of road network contributes the improvement of AI of high income region. Finally the provision of new facilities has a tendency that these kind of facilities are located to make AI of high income zone better. It means that the decision making by city government intensifies the geographical inequality. This result would be very useful in the decision making process for determining the number and the location of cultural facilities and other similar urban infrastructure as well. Also it will be helpful for the selection of optimal location which considered not only physical distances but also social equalities.

Support plan and analysis of demand for multicultural education using e-learning by marriage immigrants (이러닝 활용 다문화교육에 대한 결혼이민자의 수요 분석 및 지원 방안)

  • Ahn, Seong-Hun
    • Journal of The Korean Association of Information Education
    • /
    • v.16 no.1
    • /
    • pp.131-142
    • /
    • 2012
  • In this paper, a support plan for marriage immigrants through e-learning was studied. Currently, the number of Korean men marrying foreign women is rapidly increasing due to the increase in the imbalance of gender ratios caused by preference for boys and shunning of rural areas by Korean women. To alleviate this problem, the government is performing various social adaptation programs such as Korean language education and vocational education. Most marriage immigrants, however, are not properly educated as they are burdened by their household duties or work. In this paper, a plan for supporting marriage immigrants through e-learning was researched. To achieve this goal, a survey on intentions of the marriage immigrants to receive muticultural education through e-learning was performed. The result showed that most of marriage immigrants have a strong preference for e-learning. However, there are differences on their preference depending on the original nationality and residential region. A support plan for multicultural education through e-learning was proposed according to the above result. The support plan consists of three parts. First, education for marriage immigrants will be specialized according to their respective original nationality. Second, Korean education will be given greater emphasis in education. Third, vocation education which will benefit marriage immigrants will be prepared. Above support plans are expected to help the marriage immigrants to settle as equal members of the society instead of deteriorating into an underprivileged class by providing them with an opportunity to receive education.

  • PDF