• Title/Summary/Keyword: 불균형

Search Result 2,210, Processing Time 0.023 seconds

Resolving CTGAN-based data imbalance for commercialization of public technology (공공기술 사업화를 위한 CTGAN 기반 데이터 불균형 해소)

  • Hwang, Chul-Hyun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.1
    • /
    • pp.64-69
    • /
    • 2022
  • Commercialization of public technology is the transfer of government-led scientific and technological innovation and R&D results to the private sector, and is recognized as a key achievement driving economic growth. Therefore, in order to activate technology transfer, various machine learning methods are being studied to identify success factors or to match public technology with high commercialization potential and demanding companies. However, public technology commercialization data is in the form of a table and has a problem that machine learning performance is not high because it is in an imbalanced state with a large difference in success-failure ratio. In this paper, we present a method of utilizing CTGAN to resolve imbalances in public technology data in tabular form. In addition, to verify the effectiveness of the proposed method, a comparative experiment with SMOTE, a statistical approach, was performed using actual public technology commercialization data. In many experimental cases, it was confirmed that CTGAN reliably predicts public technology commercialization success cases.

Class Imbalance Resolution Method and Classification Algorithm Suggesting Based on Dataset Type Segmentation (데이터셋 유형 분류를 통한 클래스 불균형 해소 방법 및 분류 알고리즘 추천)

  • Kim, Jeonghun;Kwahk, Kee-Young
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.3
    • /
    • pp.23-43
    • /
    • 2022
  • In order to apply AI (Artificial Intelligence) in various industries, interest in algorithm selection is increasing. Algorithm selection is largely determined by the experience of a data scientist. However, in the case of an inexperienced data scientist, an algorithm is selected through meta-learning based on dataset characteristics. However, since the selection process is a black box, it was not possible to know on what basis the existing algorithm recommendation was derived. Accordingly, this study uses k-means cluster analysis to classify types according to data set characteristics, and to explore suitable classification algorithms and methods for resolving class imbalance. As a result of this study, four types were derived, and an appropriate class imbalance resolution method and classification algorithm were recommended according to the data set type.

Consensus-Based Distributed Algorithm for Optimal Resource Allocation of Power Network under Supply-Demand Imbalance (수급 불균형을 고려한 전력망의 최적 자원 할당을 위한 일치 기반의 분산 알고리즘)

  • Young-Hun, Lim
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.15 no.6
    • /
    • pp.440-448
    • /
    • 2022
  • Recently, due to the introduction of distributed energy resources, the optimal resource allocation problem of the power network is more and more important, and the distributed resource allocation method is required to process huge amount of data in large-scale power networks. In the optimal resource allocation problem, many studies have been conducted on the case when the supply-demand balance is satisfied due to the limitation of the generation capacity of each generator, but the studies considering the supply-demand imbalance, that total demand exceeds the maximum generation capacity, have rarely been considered. In this paper, we propose the consensus-based distributed algorithm for the optimal resource allocation of power network considering the supply-demand imbalance condition as well as the supply-demand balance condition. The proposed distributed algorithm is designed to allocate the optimal resources when the supply-demand balance condition is satisfied, and to measure the amount of required resources when the supply-demand is imbalanced. Finally, we conduct the simulations to verify the performance of the proposed algorithm.

Resolving data imbalance through differentiated anomaly data processing based on verification data (검증데이터 기반의 차별화된 이상데이터 처리를 통한 데이터 불균형 해소 방법)

  • Hwang, Chulhyun
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.179-190
    • /
    • 2022
  • Data imbalance refers to a phenomenon in which the number of data in one category is too large or too small compared to another category. Due to this, it has been raised as a major factor that deteriorates performance in machine learning that utilizes classification algorithms. In order to solve the data imbalance problem, various ovrsampling methods for amplifying prime number distribution data have been proposed. Among them, SMOTE is the most representative method. In order to maximize the amplification effect of minority distribution data, various methods have emerged that remove noise included in data (SMOTE-IPF) or enhance only border lines (Borderline SMOTE). This paper proposes a method to ultimately improve classification performance by improving the processing method for anomaly data in the traditional SMOTE method that amplifies minority classification data. The proposed method consistently presented relatively high classification performance compared to the existing methods through experiments.

Boosting the Performance of the Predictive Model on the Imbalanced Dataset Using SVM Based Bagging and Out-of-Distribution Detection (SVM 기반 Bagging과 OoD 탐색을 활용한 제조공정의 불균형 Dataset에 대한 예측모델의 성능향상)

  • Kim, Jong Hoon;Oh, Hayoung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.11
    • /
    • pp.455-464
    • /
    • 2022
  • There are two unique characteristics of the datasets from a manufacturing process. They are the severe class imbalance and lots of Out-of-Distribution samples. Some good strategies such as the oversampling over the minority class, and the down-sampling over the majority class, are well known to handle the class imbalance. In addition, SMOTE has been chosen to address the issue recently. But, Out-of-Distribution samples have been studied just with neural networks. It seems to be hardly shown that Out-of-Distribution detection is applied to the predictive model using conventional machine learning algorithms such as SVM, Random Forest and KNN. It is known that conventional machine learning algorithms are much better than neural networks in prediction performance, because neural networks are vulnerable to over-fitting and requires much bigger dataset than conventional machine learning algorithms does. So, we suggests a new approach to utilize Out-of-Distribution detection based on SVM algorithm. In addition to that, bagging technique will be adopted to improve the precision of the model.

Development of machine learning model for reefer container failure determination and cause analysis with unbalanced data (불균형 데이터를 갖는 냉동 컨테이너 고장 판별 및 원인 분석을 위한 기계학습 모형 개발)

  • Lee, Huiwon;Park, Sungho;Lee, Seunghyun;Lee, Seungjae;Lee, Kangbae
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.1
    • /
    • pp.23-30
    • /
    • 2022
  • The failure of the reefer container causes a great loss of cost, but the current reefer container alarm system is inefficient. Existing studies using simulation data of refrigeration systems exist, but studies using actual operation data of refrigeration containers are lacking. Therefore, this study classified the causes of failure using actual refrigerated container operation data. Data imbalance occurred in the actual data, and the data imbalance problem was solved by comparing the logistic regression analysis with ENN-SMOTE and class weight with the 2-stage algorithm developed in this study. The 2-stage algorithm uses XGboost, LGBoost, and DNN to classify faults and normalities in the first step, and to classify the causes of faults in the second step. The model using LGBoost in the 2-stage algorithm was the best with 99.16% accuracy. This study proposes a final model using a two-stage algorithm to solve data imbalance, which is thought to be applicable to other industries.

Utilizing Minimal Label Data for Tomato Leaf Disease Classification: An Approach through Recursive Learning Based on YOLOv8 (토마토 잎 병해 분류를 위한 최소 라벨 데이터 활용: YOLOv8 기반 재귀적 학습 방식을 통한 접근)

  • Junhyuk Lee;Namhyoung Kim
    • The Journal of Bigdata
    • /
    • v.9 no.1
    • /
    • pp.61-73
    • /
    • 2024
  • Class imbalance is one of the significant challenges in deep learning tasks, particularly pronounced in areas with limited data. This study proposes a new approach that utilizes minimal labeled data for effectively classifying tomato leaf diseases. We introduced a recursive learning method using the YOLOv8 model. By utilizing the detection predictions of images on the training data as additional training data, the number of labeled data is progressively increased. Unlike conventional data augmentation and up-down sampling techniques, this method seeks to fundamentally solve the class imbalance problem by maximizing the utility of actual data. Based on the secured labeled data, tomato leaves were extracted, and diseases were classified using the EfficientNet model. This process achieved a high accuracy of 98.92%. Notably, a 12.9% improvement compared to the baseline was observed in the detection of Late blight diseases, which has the least amount of data. This research presents a methodology that addresses data imbalance issues while offering high-precision disease classification, with the expectation of application to other crops.

Symbol Error Probability of DVB-S2 System with I/Q Unbalances (I/Q 불균형이 고려된 DVB-S2 시스템의 심벌 오류 확률)

  • Im, In-Chul;Won, Seung-Chan;Yoon, Dong-Weon;Park, Sang-Kyu
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.9C
    • /
    • pp.810-819
    • /
    • 2007
  • The I/Q unbalance which is generated by non-ideal components such as a $90^{\circ}$ phase shifter and I/Q filters is an inevitable physical phenomenon and leads to performance degradation when we implement a coherent two-dimensional (2-D) modulation/demodulation system. This paper provides an exact and general expression for the SEP(symbol error probability) of DVB-S2 system with I/Q phase and amplitude unbalance over AWGN channel. Coordinate rotation and shift techniques used to redefine a received signal are key mathematical tools. In conclusion, the derived result is expressed as a linear combination of the 2-D Gaussian Q-functions.

Behavior of Liquid Droplet Driven by Capillarity Force Imbalance on Horizontal Surface Under Various Conditions (다양한 조건하에서 모세관력 불균형에 의해 구동되는 수평 표면 위의 액적 거동)

  • Myong, Hyon Kook;Kwon, Young Hoo
    • Transactions of the Korean Society of Mechanical Engineers B
    • /
    • v.39 no.4
    • /
    • pp.359-370
    • /
    • 2015
  • The present study aims to numerically investigate the behavior of liquid droplet driven by capillarity force imbalance on horizontal surfaces ranging from hydrophilic to hydrophobic, under various conditions. The droplet behavior has been simulated using an in-house solution code(PowerCFD), which employs an unstructured cell-centered method based on a conservative pressure-based finite-volume method with interface capturing method(CICSAM) in a volume of fluid(VOF) scheme for phase interface capturing. The detailed droplet behavior was obtained under various conditions for droplets with different initial shapes, contact angles and surface tension forces(or Bond number). The mechanism of droplet transport was examined using the numerical results on the droplet shapes.

보험산업(保險産業)의 가격자유화(價格自由化)에 관한 연구(硏究)

  • Na, Dong-Min
    • KDI Journal of Economic Policy
    • /
    • v.16 no.2
    • /
    • pp.91-109
    • /
    • 1994
  • 본고(本稿)는 피보험자(被保險者)의 위험정도(危險程度)에 관하여 정보의 불균형이 존재하고 있는 보험시장에서 전체(全體) 시장참여자(市場參與者)의 효용(效用)과 기대이익(期待利益)을 증대시키기 위하여 어떠한 가격정책을 수립하여야 하는지를 분석(分析)하고, 이 분석을 토대로 현재 정부가 추진중인 가격자유화 추진계획방향을 평가하고 개선방향(改善方向)을 제시하고자 하였다. 본고(本稿)의 분석결과(分析結果)에 따르면 정보(情報)의 불균형하(不均衡下)에서 자유화 초기단계의 제한적인 가격자유화는 전체 보험이용자의 효용을 증대시키는 효과를 가져오나 자유화의 폭이 커질수록 사회전체적인 효용증대효과는 불명확해진다. 이 경우 일정범위에 대해서는 요율(料率)과 보장범위(保障範圍)를 위험에 따라 차별화하지 않는 단일(單一) 공동요율(共同料率)을 제시하는 계약(契約)을 주계약(主契約)으로 구매하게 하고, 보충계약(補充契約)인 특약부분(特約部分)에서 각 이용자가 위험정도에 따라 차별화된 가격으로 적절한 보장범위를 선택하게 하도록 정부가 유도하는 것이 바람직하다. 주계약과 보충계약으로 구성된 이러한 보조계약(補助契約)은 기존의 단일(單一) 공동요율(共同料率)에 의한 계약보다 파레토개념에서 우월한 계약으로 시장전체에 순효용증대(純效用增大)의 효과(效果)를 가져올 것이다. 또한 고지의무(告知義務)의 강화(强化), 위험분류(危險分類) 및 선택(選擇) 업무(業務)의 효율화(效率化) 등으로 보험시장내에서 정보의 불균형현상이 해소되고 나면 실질적(實質的)인 가격(價格)의 완전자유화(完全自由化)를 실시하여 파레토최적(最適)을 이루어야 할 것이다. 따라서 정부는 보험시장(保險市場)의 특성(特性)을 고려하여 주어진 조건에서 전체 시장참여자의 효용과 기대이익을 극대화하는 가격정책(價格政策)을 펴야 하며, 현재 추진중인 보험상품(保險商品) 가격자유화계획(價格自由化計劃)도 이런 관점에서 재조명되어야 할 것이다.

  • PDF