• Title/Summary/Keyword: Oversampling method

Search Result 56, Processing Time 0.024 seconds

A Study on the Adjustment of Posterior Probability for Oversampling when the Target is Rare (목표 범주가 희귀한 자료의 과대표본추출에 대한 연구)

  • Kim, U.N.;Lee, S.K.;Choi, J.H.
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.3
    • /
    • pp.477-484
    • /
    • 2011
  • When an event of target variable is rare, a widespread strategy is to build a model on the sample that disproportionally over-represents the events, that is over-sampled. Using the data over-sampled from the original data set, the predicted values would be biased; however, it can be easily corrected to represent the population. In this study, we investigate into the relationship between the proportion of rare event on a data-mart and the model performance using real world data of a Korean credit card company. Also, we use the methods for adjusting of posterior probability for over-sampled data of the offset method and the weighted method. Finally, we compare the performance of the methods using real data sets.

Oversampling-Based Ensemble Learning Methods for Imbalanced Data (불균형 데이터 처리를 위한 과표본화 기반 앙상블 학습 기법)

  • Kim, Kyung-Min;Jang, Ha-Young;Zhang, Byoung-Tak
    • KIISE Transactions on Computing Practices
    • /
    • v.20 no.10
    • /
    • pp.549-554
    • /
    • 2014
  • Handwritten character recognition data is usually imbalanced because it is collected from the natural language sentences written by different writers. The imbalanced data can cause seriously negative effect on the performance of most of machine learning algorithms. But this problem is typically ignored in handwritten character recognition, because it is considered that most of difficulties in handwritten character recognition is caused by the high variance in data set and similar shapes between characters. We propose the oversampling-based ensemble learning methods to solve imbalanced data problem in handwritten character recognition and to improve the recognition accuracy. Also we show that proposed method achieved improvements in recognition accuracy of minor classes as well as overall recognition accuracy empirically.

Experimental Comparison of Network Intrusion Detection Models Solving Imbalanced Data Problem (데이터의 불균형성을 제거한 네트워크 침입 탐지 모델 비교 분석)

  • Lee, Jong-Hwa;Bang, Jiwon;Kim, Jong-Wouk;Choi, Mi-Jung
    • KNOM Review
    • /
    • v.23 no.2
    • /
    • pp.18-28
    • /
    • 2020
  • With the development of the virtual community, the benefits that IT technology provides to people in fields such as healthcare, industry, communication, and culture are increasing, and the quality of life is also improving. Accordingly, there are various malicious attacks targeting the developed network environment. Firewalls and intrusion detection systems exist to detect these attacks in advance, but there is a limit to detecting malicious attacks that are evolving day by day. In order to solve this problem, intrusion detection research using machine learning is being actively conducted, but false positives and false negatives are occurring due to imbalance of the learning dataset. In this paper, a Random Oversampling method is used to solve the unbalance problem of the UNSW-NB15 dataset used for network intrusion detection. And through experiments, we compared and analyzed the accuracy, precision, recall, F1-score, training and prediction time, and hardware resource consumption of the models. Based on this study using the Random Oversampling method, we develop a more efficient network intrusion detection model study using other methods and high-performance models that can solve the unbalanced data problem.

Multichannel Blind Equalization using Multistep Prediction and Adaptive Implementation

  • Ahn, Kyung-Seung;Hwang, Ho-Sun;Hwang, Tae-Jin;Baik, Heung-Ki
    • Proceedings of the IEEK Conference
    • /
    • 2001.06a
    • /
    • pp.69-72
    • /
    • 2001
  • Blind equalization of transmission channel is important in communication areas and signal processing applications because it does not need training sequence, nor does it require a priori channel information. Recently, Tong et al. proposed solutions for this problem exploit the diversity induced by antenna array or time oversampling, leading to the second order statistics techniques, fur example, subspace method, prediction error method, and so on. The linear prediction error method is perhaps the most attractive in practice due to the insensitive to blind equalizer length mismatch as well as for its simple adaptive filter implementation. Unfortunately, the previous one-step prediction error method is known to be limited in arbitrary delay. In this paper, we induce the optimal delay, and propose the adaptive blind equalizer with multi-step linear prediction using RLS-type algorithm. Simulation results are presented to demonstrate the proposed algorithm and to compare it with existing algorithms.

  • PDF

PAPR reduction of OFDM systems using H-SLM method with a multiplierless IFFT/FFT technique

  • Sivadas, Namitha A.
    • ETRI Journal
    • /
    • v.44 no.3
    • /
    • pp.379-388
    • /
    • 2022
  • This study proposes a novel low-complexity algorithm for computing inverse fast Fourier transform (IFFT)/fast Fourier transform (FFT) operations in binary phase shift keying-modulated orthogonal frequency division multiplexing (OFDM) communication systems without requiring any twiddle factor multiplications. The peak-to-average power ratio (PAPR) reduction capacity of an efficient PAPR reduction technique, that is, H-SLM method, is evaluated using the proposed IFFT algorithm without any complex multiplications, and the impact of oversampling factor for the accurate calculation of PAPR is analyzed. The power spectral density of an OFDM signal generated using the proposed multiplierless IFFT algorithm is also examined. Moreover, the bit-error-rate performance of the H-SLM technique with the proposed IFFT/FFT algorithm is compared with the classical methods. Simulation results show that the proposed IFFT/FFT algorithm used in the H-SLM method requires no complex multiplications, thereby minimizing power consumption as well as the area of IFFT/FFT processors used in OFDM communication systems.

Response Modeling with Semi-Supervised Support Vector Regression (준지도 지지 벡터 회귀 모델을 이용한 반응 모델링)

  • Kim, Dong-Il
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.9
    • /
    • pp.125-139
    • /
    • 2014
  • In this paper, I propose a response modeling with a Semi-Supervised Support Vector Regression (SS-SVR) algorithm. In order to increase the accuracy and profit of response modeling, unlabeled data in the customer dataset are used with the labeled data during training. The proposed SS-SVR algorithm is designed to be a batch learning to reduce the training complexity. The label distributions of unlabeled data are estimated in order to consider the uncertainty of labeling. Then, multiple training data are generated from the unlabeled data and their estimated label distributions with oversampling to construct the training dataset with the labeled data. Finally, a data selection algorithm, Expected Margin based Pattern Selection (EMPS), is employed to reduce the training complexity. The experimental results conducted on a real-world marketing dataset showed that the proposed response modeling method trained efficiently, and improved the accuracy and the expected profit.

Research on Fault Diagnosis of Wind Power Generator Blade Based on SC-SMOTE and kNN

  • Peng, Cheng;Chen, Qing;Zhang, Longxin;Wan, Lanjun;Yuan, Xinpan
    • Journal of Information Processing Systems
    • /
    • v.16 no.4
    • /
    • pp.870-881
    • /
    • 2020
  • Because SCADA monitoring data of wind turbines are large and fast changing, the unbalanced proportion of data in various working conditions makes it difficult to process fault feature data. The existing methods mainly introduce new and non-repeating instances by interpolating adjacent minority samples. In order to overcome the shortcomings of these methods which does not consider boundary conditions in balancing data, an improved over-sampling balancing algorithm SC-SMOTE (safe circle synthetic minority oversampling technology) is proposed to optimize data sets. Then, for the balanced data sets, a fault diagnosis method based on improved k-nearest neighbors (kNN) classification for wind turbine blade icing is adopted. Compared with the SMOTE algorithm, the experimental results show that the method is effective in the diagnosis of fan blade icing fault and improves the accuracy of diagnosis.

PAPR Reduction Method of OFDM System Using Fuzzy Theory (Fuzzy 이론을 이용한 OFDM 시스템에서 PAPR 감소 기법)

  • Lee, Dong-Ho;Choi, Jung-Hun;Kim, Nam;Lee, Bong-Woon
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.21 no.7
    • /
    • pp.715-725
    • /
    • 2010
  • Orthgonal Frequency Division Multiplexing(OFDM) system is effective for the high data rate transmission in the frequency selective fading channel. In this paper we propose PAPR(Peak to Average Power Ratio) reduction method of problem in OFDM system used Fuzzy theory that often control machine. This thesis proposes PAPR reducing method of OFDM system using Fuzzy theory. The advantages for using Fuzzy theory to reduce PAPR are that it is easy to manage the data and embody the hardware, and required smaller amount of operation. Firstly, we proposed simple algorithm that is reconstructed at receiver with transmitted overall PAPR which is reduced PAPR of sub-block using Fuzzy. Although there are some drawbacks that the operation of the system is increased comparing conventional OFDM system and it is needed to send the information about Fuzzy indivisually, it is assured that the performance of the system is enhanced for PAPR reducing. To evaluate the perfomance, the proposed search algorithm is compared with the proposed algorithm in terms of the complementary cumulative distribution function(CCDF) of the PAPR and the computational complexity. As a result of using the QPSK and 16QAM modulation, Fuzzy theory method is more an effective method of reducing 2.3 dB and 3.1 dB PAPR than exiting OFDM system when FFT size(N)=512, and oversampling=4 in the base PR of $10^{-5}$.

New Gain Optimization Method for Sigma-Delta A/D Convertors (Sigma-Delta A/D 변환기의 새로운 이득 최적화 방식)

  • Jung, Yo-Sung;Jang, Young-Beom
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.46 no.9
    • /
    • pp.31-38
    • /
    • 2009
  • In this paper, we propose new gain optimization method for Sigma-Delta A/D converters. First, in proposed method, the 10 candidates are selected through SNR maximization for Sigma-Delta modulator. After then, it is shown that optimum gains can be obtained through MSE calculation for CIC decimation filter. In the simulation, The proposed method has advantages which utilize SNR maximization for modulator and MSE minimization for CIC decimation later. The more candidates are chosen in SNR maximization for modulator, the better gains can be obtained in MSE minimization for CIC decimation filter.

Blind Adaptive Channel Estimation using Multichannel Linear Prediction (다채널 선형예측을 이용한 블라인드 적응 채널 추정)

  • 조주필;안경승;황지원
    • Journal of Korea Multimedia Society
    • /
    • v.6 no.1
    • /
    • pp.114-120
    • /
    • 2003
  • Blind channel estimation of communication channels is a problem of important current theoretical concerns. Recently proposed solutions for this problem exploit the diversity induced by antenna array or time oversampling, leading to the so-called, second order statistics techniques. This paper proposes the blind adaptive channel estimation using multichannel linear prediction method. Computer simulations are presented to compare the proposed algorithm with the existing ones.

  • PDF