• Title/Summary/Keyword: Imbalance Problem

Search Result 273, Processing Time 0.028 seconds

The Detection of Online Manipulated Reviews Using Machine Learning and GPT-3 (기계학습과 GPT3를 시용한 조작된 리뷰의 탐지)

  • Chernyaeva, Olga;Hong, Taeho
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.347-364
    • /
    • 2022
  • Fraudulent companies or sellers strategically manipulate reviews to influence customers' purchase decisions; therefore, the reliability of reviews has become crucial for customer decision-making. Since customers increasingly rely on online reviews to search for more detailed information about products or services before purchasing, many researchers focus on detecting manipulated reviews. However, the main problem in detecting manipulated reviews is the difficulties with obtaining data with manipulated reviews to utilize machine learning techniques with sufficient data. Also, the number of manipulated reviews is insufficient compared with the number of non-manipulated reviews, so the class imbalance problem occurs. The class with fewer examples is under-represented and can hamper a model's accuracy, so machine learning methods suffer from the class imbalance problem and solving the class imbalance problem is important to build an accurate model for detecting manipulated reviews. Thus, we propose an OpenAI-based reviews generation model to solve the manipulated reviews imbalance problem, thereby enhancing the accuracy of manipulated reviews detection. In this research, we applied the novel autoregressive language model - GPT-3 to generate reviews based on manipulated reviews. Moreover, we found that applying GPT-3 model for oversampling manipulated reviews can recover a satisfactory portion of performance losses and shows better performance in classification (logit, decision tree, neural networks) than traditional oversampling models such as random oversampling and SMOTE.

Optimal Ratio of Data Oversampling Based on a Genetic Algorithm for Overcoming Data Imbalance (데이터 불균형 해소를 위한 유전알고리즘 기반 최적의 오버샘플링 비율)

  • Shin, Seung-Soo;Cho, Hwi-Yeon;Kim, Yong-Hyuk
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.1
    • /
    • pp.49-55
    • /
    • 2021
  • Recently, with the development of database, it is possible to store a lot of data generated in finance, security, and networks. These data are being analyzed through classifiers based on machine learning. The main problem at this time is data imbalance. When we train imbalanced data, it may happen that classification accuracy is degraded due to over-fitting with majority class data. To overcome the problem of data imbalance, oversampling strategy that increases the quantity of data of minority class data is widely used. It requires to tuning process about suitable method and parameters for data distribution. To improve the process, In this study, we propose a strategy to explore and optimize oversampling combinations and ratio based on various methods such as synthetic minority oversampling technique and generative adversarial networks through genetic algorithms. After sampling credit card fraud detection which is a representative case of data imbalance, with the proposed strategy and single oversampling strategies, we compare the performance of trained classifiers with each data. As a result, a strategy that is optimized by exploring for ratio of each method with genetic algorithms was superior to previous strategies.

Heterogeneous Ensemble of Classifiers from Under-Sampled and Over-Sampled Data for Imbalanced Data

  • Kang, Dae-Ki;Han, Min-gyu
    • International journal of advanced smart convergence
    • /
    • v.8 no.1
    • /
    • pp.75-81
    • /
    • 2019
  • Data imbalance problem is common and causes serious problem in machine learning process. Sampling is one of the effective methods for solving data imbalance problem. Over-sampling increases the number of instances, so when over-sampling is applied in imbalanced data, it is applied to minority instances. Under-sampling reduces instances, which usually is performed on majority data. We apply under-sampling and over-sampling to imbalanced data and generate sampled data sets. From the generated data sets from sampling and original data set, we construct a heterogeneous ensemble of classifiers. We apply five different algorithms to the heterogeneous ensemble. Experimental results on an intrusion detection dataset as an imbalanced datasets show that our approach shows effective results.

Public Policy and The Imbalance of The Systems: A System Dynamics Approach for The Shock of Lowered Retirement Age of Teachers on Education System

  • Yi, Mi-Sook;Choi, Nam-Hee;Kim, Doa-Hoon
    • Korean System Dynamics Review
    • /
    • v.5 no.2
    • /
    • pp.149-174
    • /
    • 2004
  • Since the economic crisis in 1997, the Korean government has implemented a number of reforms in order to eliminate inefficiencies in both private and public sectors. One of the reforms made in the public sector was to lower the retirement age of teachers from the original age of 65 to that age 62. The ultimate aim of this compulsory policy was to improve the quality level of education by hiring many young teachers instead of senior teachers. It was made based on the calculation that by lowering the retirement age by three years, the government can hire three young teachers with the saved wages. However, this policy has brought an unexpected result; the imbalance between the supply and demand for teachers has become a much more serious problem in Korea's elementary education system The purpose of this study is largely twofold; First of all, it aims to identify the scope of imbalances occurred in the supply-demand system of elementary school teachers in a region of the nation, and also to find out why such imbalance occurred. Secondly, the purpose of this study is to experiment with feasible policy alternatives and their effects on the system and to suggest some resolutions on the imbalance.

  • PDF

Compensation of Phase Noise and IQ Imbalance in the OFDM Communication System of DFT Spreading Method (DFT 확산 방식의 OFDM 통신 시스템에서 위상잡음과 직교 불균형 보상)

  • Ryu, Sang-Burm;Ryu, Heung-Gyoon
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.20 no.1
    • /
    • pp.21-28
    • /
    • 2009
  • DFT-spread OFDM(Discrete Fourier Transform-Spread Orthogonal Frequency Division Multiplexing) is very effective for solving the PAPR(Peak-to-Average Power Ratio) problem. Therefore, the SC-FDMA(Single Carrier-Frequency Division Multiple Access) which is basically same to the DFT spread OFDM was adopted as the uplink standard of the 3GPP LTE ($3^{rd}$ Generation Partnership Project Long Term Evolution). Unlike the ordinary OFDM system, the SC-FDMA using DFT spreading method is vulnerable to the ICI(Inter-Carrier Interference) problem caused by the phase noise and IQ(In-phase/Quadrature) imbalance and effected FDE(Frequency Domain Equalizer). In this paper, the ICI effects from the phase noise and IQ imbalance which can be problems in uplink transmission are analyzed according the back-off level of HPA. Next, we propose the equalizer algorithm to remove the ICI effects. This proposed equalizer based on the FDE can be considered as up-graded and improved version of PNS(Phase Noise Suppression) algorithm. This proposed equalizer effectively compensates the ICI resulting from the phase noise and IQ imbalance. Finally, through the computer simulation, it can be shown that about SNR=14 dB is required for the $BER=10^{-4}$ after ICI compensation when the back-off is 4.5 dB, $\varepsilon=0.005$, $\phi=5^{\circ}$, and $pn=0.06\;rad^2$.

An Improved SVPWM Control of Voltage Imbalance in Capacitors of a Single-Phase Multilevel Inverter

  • Ramirez, Fernando Arturo;Arjona, Marco A.
    • Journal of Power Electronics
    • /
    • v.15 no.5
    • /
    • pp.1235-1243
    • /
    • 2015
  • This paper presents a modified Space Vector Pulse Width Modulation Technique (SVPWM), which solves the well-known problem of voltage imbalance in the capacitors of a single-phase multilevel inverter. The proposed solution is based on the measurement of DC voltage levels at each capacitor of the inverter DC bus. The measurements are then used to adjust the size of the active vectors within the SVPWM algorithm to keep the voltage waveform sinusoidal regardless of any voltage imbalance on the DC link capacitors. When a voltage deviation exceeds a predetermined hysteresis band, the correspondent voltage vector is restricted to restore the voltage level to an acceptable threshold. Hence, the need for external voltage regulators for the voltage capacitors is eliminated. The functionality of the proposed algorithm is successfully demonstrated through simulations and experiments on a grid tied application.

Correction of mass imbalance of a high precision rotor (Impact를 이용한 정밀 고속 회전체 불평형 보정)

  • Lee, S.B.;Ihn, Y.S.;Oh, D.H.;Kim, H.Y.;Lee, H.S.;Koo, J.C.
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2007.05a
    • /
    • pp.843-847
    • /
    • 2007
  • The unbalanced mass of a high precision rotor deteriorates mechanical performance of the rotor. The geometrical center of a rotor generally corresponds to the rotational axis of the rotor. However, this alignment carried out with a stationary rotor does not guarantee the dynamic rotor balance. There have been a number of schemes for the correction of the imbalance published for decades especially in the hard drive industry where the issues are directly affecting manufacturing costs and product performances. Realizing the significance of the problem, the present work tries to refine one of the methods that works by applying external impact during a rotor spins. A systematic way to apply the external impact to a rotating rotor has been introduced to minimize imbalance correction process time.

  • PDF

Direct acceleration feedback control of a washing machine during spinning process (드럼 세탁기 탈수시 가속도 피드백 제어)

  • Lee, Chin-Won;Seichiro, Suzuki;Sun, Hee-Bok
    • Proceedings of the KSME Conference
    • /
    • 2003.11a
    • /
    • pp.1642-1647
    • /
    • 2003
  • The market of the horizontal axis washing machine (drum washing machine) has been growing drastically in Korea by about 80% annually since 2000. As market grows fast, the customerTs demands concerning quality becomes more strict and various. Imbalance sensing is a key technology to reduce the NVH problem in a washing machine, because the laundry is time-variant and uncontrollable source of imbalance, which can cause more than 200kgf exciting force. In this paper, imbalance-sensing methods are briefly reviewed, new acceleration sensing circuits are examined, and finally the control algorithm of spinning process is proposed and validated.

  • PDF

Some Further Consideration for the Image Retrieving of Synthetic Aperture Radiometer

  • Liu, Hao;Wu, Ji;Wu, Qiong
    • Proceedings of the KSRS Conference
    • /
    • 2003.11a
    • /
    • pp.1349-1351
    • /
    • 2003
  • In this paper, theoretical channels model of Synthetic Aperture Radiometer is presented. Based on this model, how amplitude imbalance, phase imbalance and mutual coupling between the different channels effect brightness temperature image retrieving is analyzed. The computer simulation results are also presented to find out the cause of the along-track streaks usually appeared in the retrieved brightness temperature image. In addition, a new system calibration approach is introduced to solve this problem.

  • PDF

Ensemble Learning for Solving Data Imbalance in Bankruptcy Prediction (기업부실 예측 데이터의 불균형 문제 해결을 위한 앙상블 학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.15 no.3
    • /
    • pp.1-15
    • /
    • 2009
  • In a classification problem, data imbalance occurs when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. This paper proposes a Geometric Mean-based Boosting (GM-Boost) to resolve the problem of data imbalance. Since GM-Boost introduces the notion of geometric mean, it can perform learning process considering both majority and minority sides, and reinforce the learning on misclassified data. An empirical study with bankruptcy prediction on Korea companies shows that GM-Boost has the higher classification accuracy than previous methods including Under-sampling, Over-Sampling, and AdaBoost, used in imbalanced data and robust learning performance regardless of the degree of data imbalance.

  • PDF