• Title/Summary/Keyword: class imbalance

Search Result 120, Processing Time 0.027 seconds

Analyzing Key Variables in Network Attack Classification on NSL-KDD Dataset using SHAP (SHAP 기반 NSL-KDD 네트워크 공격 분류의 주요 변수 분석)

  • Sang-duk Lee;Dae-gyu Kim;Chang Soo Kim
    • Journal of the Society of Disaster Information
    • /
    • v.19 no.4
    • /
    • pp.924-935
    • /
    • 2023
  • Purpose: The central aim of this study is to leverage machine learning techniques for the classification of Intrusion Detection System (IDS) data, with a specific focus on identifying the variables responsible for enhancing overall performance. Method: First, we classified 'R2L(Remote to Local)' and 'U2R (User to Root)' attacks in the NSL-KDD dataset, which are difficult to detect due to class imbalance, using seven machine learning models, including Logistic Regression (LR) and K-Nearest Neighbor (KNN). Next, we use the SHapley Additive exPlanation (SHAP) for two classification models that showed high performance, Random Forest (RF) and Light Gradient-Boosting Machine (LGBM), to check the importance of variables that affect classification for each model. Result: In the case of RF, the 'service' variable and in the case of LGBM, the 'dst_host_srv_count' variable were confirmed to be the most important variables. These pivotal variables serve as key factors capable of enhancing performance in the context of classification for each respective model. Conclusion: In conclusion, this paper successfully identifies the optimal models, RF and LGBM, for classifying 'R2L' and 'U2R' attacks, while elucidating the crucial variables associated with each selected model.

Intrusion Detection Approach using Feature Learning and Hierarchical Classification (특징학습과 계층분류를 이용한 침입탐지 방법 연구)

  • Han-Sung Lee;Yun-Hee Jeong;Se-Hoon Jung
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.1
    • /
    • pp.249-256
    • /
    • 2024
  • Machine learning-based intrusion detection methodologies require a large amount of uniform learning data for each class to be classified, and have the problem of having to retrain the entire system when adding an attack type to be detected or classified. In this paper, we use feature learning and hierarchical classification methods to solve classification problems and data imbalance problems using relatively little training data, and propose an intrusion detection methodology that makes it easy to add new attack types. The feasibility of the proposed system was verified through experiments using KDD IDS data..

A Class-C type Wideband Current-Reuse VCO With 2-Step Auto Amplitude Calibration(AAC) Loop (2 단계 자동 진폭 캘리브레이션 기법을 적용한 넓은 튜닝 범위를 갖는 클래스-C 타입 전류 재사용 전압제어발진기 설계)

  • Kim, Dongyoung;Choi, Jinwook;Lee, Dongsoo;Lee, Kang-Yoon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.11
    • /
    • pp.94-100
    • /
    • 2014
  • In this paper, a design of low power Current-Reuse Voltage Controlled Oscillator (VCO) which has wide tuning range about 1.95 GHz ~ 3.15 GHz is presented. Class-C type is applied to improve phase noise and 2-Step Auto Amplitude Calibration (AAC) is used for minimizing the imbalance of differential VCO output voltage which is main issue of Current-Reuse VCO. The mismatch of differential VCO output voltage is presented about 1.5mV ~ 4.5mV. This mismatch is within 0.6 % compared with VCO output voltage. Proposed Current-Reuse VCO is designed using CMOS $0.13{\mu}m$ process. Supply voltage is 1.2 V and current consumption is 2.6 mA at center frequency. The phase noise is -116.267 dBc/Hz at 2.3GHz VCO frequency at 1MHz offset. The layout size is $720{\times}580{\mu}m^2$.

The Performance Improvement of U-Net Model for Landcover Semantic Segmentation through Data Augmentation (데이터 확장을 통한 토지피복분류 U-Net 모델의 성능 개선)

  • Baek, Won-Kyung;Lee, Moung-Jin;Jung, Hyung-Sup
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_2
    • /
    • pp.1663-1676
    • /
    • 2022
  • Recently, a number of deep-learning based land cover segmentation studies have been introduced. Some studies denoted that the performance of land cover segmentation deteriorated due to insufficient training data. In this study, we verified the improvement of land cover segmentation performance through data augmentation. U-Net was implemented for the segmentation model. And 2020 satellite-derived landcover dataset was utilized for the study data. The pixel accuracies were 0.905 and 0.923 for U-Net trained by original and augmented data respectively. And the mean F1 scores of those models were 0.720 and 0.775 respectively, indicating the better performance of data augmentation. In addition, F1 scores for building, road, paddy field, upland field, forest, and unclassified area class were 0.770, 0.568, 0.433, 0.455, 0.964, and 0.830 for the U-Net trained by original data. It is verified that data augmentation is effective in that the F1 scores of every class were improved to 0.838, 0.660, 0.791, 0.530, 0.969, and 0.860 respectively. Although, we applied data augmentation without considering class balances, we find that data augmentation can mitigate biased segmentation performance caused by data imbalance problems from the comparisons between the performances of two models. It is expected that this study would help to prove the importance and effectiveness of data augmentation in various image processing fields.

Total Dietary Fiber and Mineral Absorption

  • Gordon, Dennis-T.
    • Journal of Nutrition and Health
    • /
    • v.25 no.6
    • /
    • pp.429-449
    • /
    • 1992
  • The consumption of foods rich in TDF should not be associated with impaired mineral absorp-tion and long-term mineral status. In surveys of populations consuming high amounts of TDF e.g Third World populations and vegetarinas gross deficiencies in mineral nutrition have not been noted. If mineral status is low among these groups it is most likely caused by the inadequacy or imbalance of the diet and not by the TDF. The key word is interaction which should be inte-rpreted in dietary imbalances that produce nut-rient deficiencies. There are no strong data to support the concept that TDF inhibits mineral absorption through a binding chelation mechanism. Limited data sug-gest that positively charged groups on polymers such as chitosan and cholestyramine will decrease iron absorption in humans and animals. Because TDF does not contain positively charged groups future research should be directed at the possible role of protein consumed along with TDF and the combination of effects on mineral nutrition Phytic acid is acknowledged as a potent chela-tor of zinc. However its association with zinc and its propensity to lower Zn bioavaiability may enhance the absorption of other elements notably copper and iron. The importance of interactions among nutrients including TDF will gain addi-tional attention in the scientific community. Soluble and insoluble dietary fiber function di-fferently in the intestine. Insoluble fibers accele-rate movement through the intestine. Soluble die-tary fibers appear to regulated blood concentra-tions of glucose and cholesterol albeit by some unknown mechanism. In creased viscosity produ-ced by the SDF in the intestine may provide an explanation of how this class of polymers affects plasma glucose cholesterol and other nutrients. Employing a double-perfusion technique in the rat we demonstrated that viscosity produced by SDF will delay transfer of zinc into the circulatory system. This delayed absorption should not be interpreted as decreased utilization. A great deal of additional research is required to prove the importance of luminaly viscosity produced by SDF on slowing nutrient absorption or regulating bllod nutrient homeostasis. Increased intake of TDF in the total human diet appears desirable. A dietary intake of 35g/day should not be considered to have a negative effect on mineral absorption. It is important to educate people that an intake of more than 35g TDF/day may cause an imbalance in the diet that can adve-rsely affect mineral utilization. Acknowledgments. Appreciation is given to Dr. George V. Vahouny(deceased) who was intense a great competitor in and out of science and who gave the author inspiration Portions of this work were supported by the University of Missouri Ag-ricultural Station and by a grant from the Univer-rch Support Grant RR 07053 from the National Institutes of Health. Contribution of the Missouri Agriculatural Experiments Station Journal Series No. 10747.

  • PDF

A Method of Machine Learning-based Defective Health Functional Food Detection System for Efficient Inspection of Imported Food (효율적 수입식품 검사를 위한 머신러닝 기반 부적합 건강기능식품 탐지 방법)

  • Lee, Kyoungsu;Bak, Yerin;Shin, Yoonjong;Sohn, Kwonsang;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.3
    • /
    • pp.139-159
    • /
    • 2022
  • As interest in health functional foods has increased since COVID-19, the importance of imported food safety inspections is growing. However, in contrast to the annual increase in imports of health functional foods, the budget and manpower required for inspections for import and export are reaching their limit. Hence, the purpose of this study is to propose a machine learning model that efficiently detects unsuitable food suitable for the characteristics of data possessed by government offices on imported food. First, the components of food import/export inspections data that affect the judgment of nonconformity were examined and derived variables were newly created. Second, in order to select features for the machine learning, class imbalance and nonlinearity were considered when performing exploratory analysis on imported food-related data. Third, we try to compare the performance and interpretability of each model by applying various machine learning techniques. In particular, the ensemble model was the best, and it was confirmed that the derived variables and models proposed in this study can be helpful to the system used in import/export inspections.

Feature Selection for Anomaly Detection Based on Genetic Algorithm (유전 알고리즘 기반의 비정상 행위 탐지를 위한 특징선택)

  • Seo, Jae-Hyun
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.7
    • /
    • pp.1-7
    • /
    • 2018
  • Feature selection, one of data preprocessing techniques, is one of major research areas in many applications dealing with large dataset. It has been used in pattern recognition, machine learning and data mining, and is now widely applied in a variety of fields such as text classification, image retrieval, intrusion detection and genome analysis. The proposed method is based on a genetic algorithm which is one of meta-heuristic algorithms. There are two methods of finding feature subsets: a filter method and a wrapper method. In this study, we use a wrapper method, which evaluates feature subsets using a real classifier, to find an optimal feature subset. The training dataset used in the experiment has a severe class imbalance and it is difficult to improve classification performance for rare classes. After preprocessing the training dataset with SMOTE, we select features and evaluate them with various machine learning algorithms.

Pattern Analysis of Traffic Accident data and Prediction of Victim Injury Severity Using Hybrid Model (교통사고 데이터의 패턴 분석과 Hybrid Model을 이용한 피해자 상해 심각도 예측)

  • Ju, Yeong Ji;Hong, Taek Eun;Shin, Ju Hyun
    • Smart Media Journal
    • /
    • v.5 no.4
    • /
    • pp.75-82
    • /
    • 2016
  • Although Korea's economic and domestic automobile market through the change of road environment are growth, the traffic accident rate has also increased, and the casualties is at a serious level. For this reason, the government is establishing and promoting policies to open traffic accident data and solve problems. In this paper, describe the method of predicting traffic accidents by eliminating the class imbalance using the traffic accident data and constructing the Hybrid Model. Using the original traffic accident data and the sampled data as learning data which use FP-Growth algorithm it learn patterns associated with traffic accident injury severity. Accordingly, In this paper purpose a method for predicting the severity of a victim of a traffic accident by analyzing the association patterns of two learning data, we can extract the same related patterns, when a decision tree and multinomial logistic regression analysis are performed, a hybrid model is constructed by assigning weights to related attributes.

Problems and Development of Police Officials' Physical Fitness Tests (경찰공무원 체력검정의 문제점 및 발전방안)

  • Kim, Sang-Woon
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.8
    • /
    • pp.609-619
    • /
    • 2019
  • The study aims to present a solution to the problem of police physical fitness tests as police officers, who tried to subdue the drunk in a video clip titled "Darim-dong female police officer Assault" in May in Seoul, showed a rather lethargic figure, such as being pushed out of a physical fight with the suspect. The police physical fitness test is subject to criticism as it consists of items that are difficult to apply in real life despite having to be linked to job performance. The problem is that the physical fitness test events are not realistic, the physical fitness test standards are set too low, and their credibility is not reliable due to the imbalance of standards between men and women and the vision culture of physical fitness testing methods. First of all, we hope that the Republic of Korea will become a world-class security powerhouse by upgrading its physical fitness standards and establishing a scientific fitness test system for preventing injuries and effectively measuring physical strength.

CAB: Classifying Arrhythmias based on Imbalanced Sensor Data

  • Wang, Yilin;Sun, Le;Subramani, Sudha
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.7
    • /
    • pp.2304-2320
    • /
    • 2021
  • Intelligently detecting anomalies in health sensor data streams (e.g., Electrocardiogram, ECG) can improve the development of E-health industry. The physiological signals of patients are collected through sensors. Timely diagnosis and treatment save medical resources, promote physical health, and reduce complications. However, it is difficult to automatically classify the ECG data, as the features of ECGs are difficult to extract. And the volume of labeled ECG data is limited, which affects the classification performance. In this paper, we propose a Generative Adversarial Network (GAN)-based deep learning framework (called CAB) for heart arrhythmia classification. CAB focuses on improving the detection accuracy based on a small number of labeled samples. It is trained based on the class-imbalance ECG data. Augmenting ECG data by a GAN model eliminates the impact of data scarcity. After data augmentation, CAB classifies the ECG data by using a Bidirectional Long Short Term Memory Recurrent Neural Network (Bi-LSTM). Experiment results show a better performance of CAB compared with state-of-the-art methods. The overall classification accuracy of CAB is 99.71%. The F1-scores of classifying Normal beats (N), Supraventricular ectopic beats (S), Ventricular ectopic beats (V), Fusion beats (F) and Unclassifiable beats (Q) heartbeats are 99.86%, 97.66%, 99.05%, 98.57% and 99.88%, respectively. Unclassifiable beats (Q) heartbeats are 99.86%, 97.66%, 99.05%, 98.57% and 99.88%, respectively.