• Title/Summary/Keyword: 정보불균형

Search Result 588, Processing Time 0.03 seconds

Development of machine learning model for reefer container failure determination and cause analysis with unbalanced data (불균형 데이터를 갖는 냉동 컨테이너 고장 판별 및 원인 분석을 위한 기계학습 모형 개발)

  • Lee, Huiwon;Park, Sungho;Lee, Seunghyun;Lee, Seungjae;Lee, Kangbae
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.1
    • /
    • pp.23-30
    • /
    • 2022
  • The failure of the reefer container causes a great loss of cost, but the current reefer container alarm system is inefficient. Existing studies using simulation data of refrigeration systems exist, but studies using actual operation data of refrigeration containers are lacking. Therefore, this study classified the causes of failure using actual refrigerated container operation data. Data imbalance occurred in the actual data, and the data imbalance problem was solved by comparing the logistic regression analysis with ENN-SMOTE and class weight with the 2-stage algorithm developed in this study. The 2-stage algorithm uses XGboost, LGBoost, and DNN to classify faults and normalities in the first step, and to classify the causes of faults in the second step. The model using LGBoost in the 2-stage algorithm was the best with 99.16% accuracy. This study proposes a final model using a two-stage algorithm to solve data imbalance, which is thought to be applicable to other industries.

정보보호 단기 교육과정

  • 송철복
    • Review of KIISC
    • /
    • v.13 no.2
    • /
    • pp.26-31
    • /
    • 2003
  • 2003년 1월 25일 전국에 몰아닥친 인터넷 마비사태는 '인터넷 강국'으로 자부해온 우리가 얼마나 정보보호에 취약한가를 여지없이 보여준 사건이었다. 이 사건은 또 정보보호 산업의 중요성을 새삼 일깨운 계기로도 작용했다. '정보사회의 방위산업', 'IT산업의 성장엔진'으로 불리는 정보보호 산업은 전략적\ulcorner경제적 중요성이 갈수록 커지고 있다. 사이버 위협이 지속적으로 지능화\ulcorner고도화함에 따라 정보보호 인력 수요가 증대하고 있으나 공급은 이에 미치지 못해 2003년에서 2007년 사이 약 22,000명 이상의 수급 불균형(공급부족)이 예상된다. 이러한 불균형 해소를 위하여 대학, 민간교육기관, 정부기관 등에서 정보보호 인력양성에 힘쓰고 있다. 본 논문에서는 정보보호 기술경쟁력 향상과 정보보호 인식제고를 주안점으로 하는 단기 교육과정에 대해 살펴본다.

The Impact of Information on Stock Message Boards on Stock Trading Behaviors of Individual Investors based on Order Imbalance Analysis (온라인 주식게시판 정보가 주식투자자의 거래행태에 미치는 영향)

  • Kim, Hyun Mo;Park, Jae Hong
    • Information Systems Review
    • /
    • v.18 no.2
    • /
    • pp.23-38
    • /
    • 2016
  • Previous studies on information systems (IS) and finance suggest that information on stock message boards influence the investment decisions of individual investors. However, how information on online stock message boards influences an individual investor's buy or sell decisions is unclear. To address this research question, we investigate the relationship between a number of posts on stock message boards and order imbalance in stock markets. Order imbalance is defined as the difference between the daily sum of buy-side shares traded and the daily sum of sell-side shares traded. Therefore, order imbalance can suggest the direction of trades and the strength of the direction with trading volumes. In this regard, this study examines how the number of posts (information on stock message boards) influences order imbalance (stock trading behavior). We collected about 46,077 messages of 40 companies on the Korea Composite Stock Price Index from Paxnet, the most popular Korean online stock message board. The messages we collected were divided based on in-trading and after-trading hours to examine the relationship between the numbers of posts and trading volumes. We also collected order imbalance data on individual investors. We then integrated the balanced panel data sets and analyzed them through vector regression. We found that the number of posts on online stock message boards is positively related to prior order imbalance. We believe that our findings contribute to knowledge in IS and finance. Furthermore, this study suggests that investors should carefully monitor information on stock message boards to understand stock market sentiments.

Processing Method of Unbalanced Data for a Fault Detection System Based Motor Gear Sound (모터 동작음 기반 불량 검출 시스템을 위한 불균형 데이터 처리 방안 연구)

  • Lee, Younghwa;Choi, Geonyoung;Park, Gooman
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2022.06a
    • /
    • pp.1305-1307
    • /
    • 2022
  • 자동차 부품의 결함은 시스템 전체의 성능 저하 및 인적 물적 손실이 발생할 수 있으므로 생산라인에서의 불량 검출은 매우 중요하다. 따라서 정확하고 균일한 결과의 불량 검출을 위해 딥러닝 기반의 고장 진단 시스템이 다양하게 연구되고 있다. 하지만 제조현장에서는 정상 샘플보다 비정상 샘플의 발생 빈도가 현저히 낮다. 이는 학습 데이터의 클래스 불균형 문제로 이어지게 되고, 이러한 불균형 문제는 고장을 판별하는 분류 모델의 성능에 영향을 끼치게 된다. 이에 본 연구에서는 모터의 동작음으로부터 불량 모터를 판별하는 불량 검출 시스템 설계를 위한 데이터 불균형 해결 방법을 제안한다. 자동차 사이드 미러 모터의 동작음을 학습 및 테스트를 위한 데이터 셋으로 사용하였으며 손실함수 계산 시 학습 데이터 셋의 클래스별 샘플 수 가 반영되는 label-distribution-aware margin(LDAM) loss 와 Inception, ResNet, DenseNet 신경망 모델의 비교 분석을 통해 불균형 데이터를 처리할 수 있는 가능성을 보여주었다.

  • PDF

지역 정보센터 순례 - (주)강원정보센터(KITEL)

  • Korea Database Promotion Center
    • Digital Contents
    • /
    • no.3 s.46
    • /
    • pp.46-47
    • /
    • 1997
  • 강원정보센터는 정보의 공유를 통한 아시아 정보국가 건설에 기초를 두면서 세계적인 정보센터로 발돋움한다는 거대한 목표를 실천하고자 노력하고 있다. 지역간 정보 불균형을 해소하고자 노력하고 있는 강원정보센터를 살펴봤다.

  • PDF

A Pipelined Hash Join Method for Load Balancing (부하 균형 유지를 고려한 파이프라인 해시 조인 방법)

  • Moon, Jin-Gue;Park, No-Sang;Kim, Pyeong-Jung;Jin, Seong-Il
    • The KIPS Transactions:PartD
    • /
    • v.9D no.5
    • /
    • pp.755-768
    • /
    • 2002
  • We investigate the effect of the data skew of join attributes on the performance of a pipelined multi-way hash join method, and propose two new hash join methods with load balancing capabilities. The first proposed method allocates buckets statically by round-robin fashion, and the second one allocates buckets adaptively via a frequency distribution. Using hash-based joins, multiple joins can be pipelined so that the early results from a join, before the whole join is completed, are sent to the next join processing without staying on disks. Unless the pipelining execution of multiple hash joins includes some load balancing mechanisms, the skew effect can severely deteriorate system performance. In this paper, we derive an execution model of the pipeline segment and a cost model, and develop a simulator for the study. As shown by our simulation with a wide range of parameters, join selectivities and sizes of relations deteriorate the system performance as the degree of data skew is larger. But the proposed method using a large number of buckets and a tuning technique can offer substantial robustness against a wide range of skew conditions.

A Load Balancing Method using Partition Tuning for Pipelined Multi-way Hash Join (다중 해시 조인의 파이프라인 처리에서 분할 조율을 통한 부하 균형 유지 방법)

  • Mun, Jin-Gyu;Jin, Seong-Il;Jo, Seong-Hyeon
    • Journal of KIISE:Databases
    • /
    • v.29 no.3
    • /
    • pp.180-192
    • /
    • 2002
  • We investigate the effect of the data skew of join attributes on the performance of a pipelined multi-way hash join method, and propose two new harsh join methods in the shared-nothing multiprocessor environment. The first proposed method allocates buckets statically by round-robin fashion, and the second one allocates buckets dynamically via a frequency distribution. Using harsh-based joins, multiple joins can be pipelined to that the early results from a join, before the whole join is completed, are sent to the next join processing without staying in disks. Shared nothing multiprocessor architecture is known to be more scalable to support very large databases. However, this hardware structure is very sensitive to the data skew. Unless the pipelining execution of multiple hash joins includes some dynamic load balancing mechanism, the skew effect can severely deteriorate the system performance. In this parer, we derive an execution model of the pipeline segment and a cost model, and develop a simulator for the study. As shown by our simulation with a wide range of parameters, join selectivities and sizes of relations deteriorate the system performance as the degree of data skew is larger. But the proposed method using a large number of buckets and a tuning technique can offer substantial robustness against a wide range of skew conditions.

The Optimization of Ensembles for Bankruptcy Prediction (기업부도 예측 앙상블 모형의 최적화)

  • Myoung Jong Kim;Woo Seob Yun
    • Information Systems Review
    • /
    • v.24 no.1
    • /
    • pp.39-57
    • /
    • 2022
  • This paper proposes the GMOPTBoost algorithm to improve the performance of the AdaBoost algorithm for bankruptcy prediction in which class imbalance problem is inherent. AdaBoost algorithm has the advantage of providing a robust learning opportunity for misclassified samples. However, there is a limitation in addressing class imbalance problem because the concept of arithmetic mean accuracy is embedded in AdaBoost algorithm. GMOPTBoost can optimize the geometric mean accuracy and effectively solve the category imbalance problem by applying Gaussian gradient descent. The samples are constructed according to the following two phases. First, five class imbalance datasets are constructed to verify the effect of the class imbalance problem on the performance of the prediction model and the performance improvement effect of GMOPTBoost. Second, class balanced data are constituted through data sampling techniques to verify the performance improvement effect of GMOPTBoost. The main results of 30 times of cross-validation analyzes are as follows. First, the class imbalance problem degrades the performance of ensembles. Second, GMOPTBoost contributes to performance improvements of AdaBoost ensembles trained on imbalanced datasets. Third, Data sampling techniques have a positive impact on performance improvement. Finally, GMOPTBoost contributes to significant performance improvement of AdaBoost ensembles trained on balanced datasets.

Ensemble Learning for Solving Data Imbalance in Bankruptcy Prediction (기업부실 예측 데이터의 불균형 문제 해결을 위한 앙상블 학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.15 no.3
    • /
    • pp.1-15
    • /
    • 2009
  • In a classification problem, data imbalance occurs when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. This paper proposes a Geometric Mean-based Boosting (GM-Boost) to resolve the problem of data imbalance. Since GM-Boost introduces the notion of geometric mean, it can perform learning process considering both majority and minority sides, and reinforce the learning on misclassified data. An empirical study with bankruptcy prediction on Korea companies shows that GM-Boost has the higher classification accuracy than previous methods including Under-sampling, Over-Sampling, and AdaBoost, used in imbalanced data and robust learning performance regardless of the degree of data imbalance.

  • PDF

Oversampling-Based Ensemble Learning Methods for Imbalanced Data (불균형 데이터 처리를 위한 과표본화 기반 앙상블 학습 기법)

  • Kim, Kyung-Min;Jang, Ha-Young;Zhang, Byoung-Tak
    • KIISE Transactions on Computing Practices
    • /
    • v.20 no.10
    • /
    • pp.549-554
    • /
    • 2014
  • Handwritten character recognition data is usually imbalanced because it is collected from the natural language sentences written by different writers. The imbalanced data can cause seriously negative effect on the performance of most of machine learning algorithms. But this problem is typically ignored in handwritten character recognition, because it is considered that most of difficulties in handwritten character recognition is caused by the high variance in data set and similar shapes between characters. We propose the oversampling-based ensemble learning methods to solve imbalanced data problem in handwritten character recognition and to improve the recognition accuracy. Also we show that proposed method achieved improvements in recognition accuracy of minor classes as well as overall recognition accuracy empirically.