• Title/Summary/Keyword: Imbalance Problem

Search Result 273, Processing Time 0.028 seconds

Study on Lifelog Anomaly Detection using VAE-based Machine Learning Model (VAE(Variational AutoEncoder) 기반 머신러닝 모델을 활용한 체중 라이프로그 이상탐지에 관한 연구)

  • Kim, Jiyong;Park, Minseo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.4
    • /
    • pp.91-98
    • /
    • 2022
  • Lifelog data continuously collected through a wearable device may contain many outliers, so in order to improve data quality, it is necessary to find and remove outliers. In general, since the number of outliers is less than the number of normal data, a class imbalance problem occurs. To solve this imbalance problem, we propose a method that applies Variational AutoEncoder to outliers. After preprocessing the outlier data with proposed method, it is verified through a number of machine learning models(classification). As a result of verification using body weight data, it was confirmed that the performance was improved in all classification models. Based on the experimental results, when analyzing lifelog body weight data, we propose to apply the LightGBM model with the best performance after preprocessing the data using the outlier processing method proposed in this study.

Classification Abnormal temperatures based on Meteorological Environment using Random forests (랜덤포레스트를 이용한 기상 환경에 따른 이상기온 분류)

  • Youn Su Kim;Kwang Yoon Song;In Hong Chang
    • Journal of Integrative Natural Science
    • /
    • v.17 no.1
    • /
    • pp.1-12
    • /
    • 2024
  • Many abnormal climate events are occurring around the world. The cause of abnormal climate is related to temperature. Factors that affect temperature include excessive emissions of carbon and greenhouse gases from a global perspective, and air circulation from a local perspective. Due to the air circulation, many abnormal climate phenomena such as abnormally high temperature and abnormally low temperature are occurring in certain areas, which can cause very serious human damage. Therefore, the problem of abnormal temperature should not be approached only as a case of climate change, but should be studied as a new category of climate crisis. In this study, we proposed a model for the classification of abnormal temperature using random forests based on various meteorological data such as longitudinal observations, yellow dust, ultraviolet radiation from 2018 to 2022 for each region in Korea. Here, the meteorological data had an imbalance problem, so the imbalance problem was solved by oversampling. As a result, we found that the variables affecting abnormal temperature are different in different regions. In particular, the central and southern regions are influenced by high pressure (Mainland China, Siberian high pressure, and North Pacific high pressure) due to their regional characteristics, so pressure-related variables had a significant impact on the classification of abnormal temperature. This suggests that a regional approach can be taken to predict abnormal temperatures from the surrounding meteorological environment. In addition, in the event of an abnormal temperature, it seems that it is possible to take preventive measures in advance according to regional characteristics.

KNN-Based Automatic Cropping for Improved Threat Object Recognition in X-Ray Security Images

  • Dumagpi, Joanna Kazzandra;Jung, Woo-Young;Jeong, Yong-Jin
    • Journal of IKEEE
    • /
    • v.23 no.4
    • /
    • pp.1134-1139
    • /
    • 2019
  • One of the most important applications of computer vision algorithms is the detection of threat objects in x-ray security images. However, in the practical setting, this task is complicated by two properties inherent to the dataset, namely, the problem of class imbalance and visual complexity. In our previous work, we resolved the class imbalance problem by using a GAN-based anomaly detection to balance out the bias induced by training a classification model on a non-practical dataset. In this paper, we propose a new method to alleviate the visual complexity problem by using a KNN-based automatic cropping algorithm to remove distracting and irrelevant information from the x-ray images. We use the cropped images as inputs to our current model. Empirical results show substantial improvement to our model, e.g. about 3% in the practical dataset, thus further outperforming previous approaches, which is very critical for security-based applications.

Optimization of Uneven Margin SVM to Solve Class Imbalance in Bankruptcy Prediction (비대칭 마진 SVM 최적화 모델을 이용한 기업부실 예측모형의 범주 불균형 문제 해결)

  • Sung Yim Jo;Myoung Jong Kim
    • Information Systems Review
    • /
    • v.24 no.4
    • /
    • pp.23-40
    • /
    • 2022
  • Although Support Vector Machine(SVM) has been used in various fields such as bankruptcy prediction model, the hyperplane learned by SVM in class imbalance problem can be severely skewed toward minority class and has a negative impact on performance because the area of majority class is expanded while the area of minority class is invaded. This study proposed optimized uneven margin SVM(OPT-UMSVM) combining threshold moving or post scaling method with UMSVM to cope with the limitation of the traditional even margin SVM(EMSVM) in class imbalance problem. OPT-UMSVM readjusted the skewed hyperplane to the majority class and had better generation ability than EMSVM improving the sensitivity of minority class and calculating the optimized performance. To validate OPT-UMSVM, 10-fold cross validations were performed on five sub-datasets with different imbalance ratio values. Empirical results showed two main findings. First, UMSVM had a weak effect on improving the performance of EMSVM in balanced datasets, but it greatly outperformed EMSVM in severely imbalanced datasets. Second, compared to EMSVM and conventional UMSVM, OPT-UMSVM had better performance in both balanced and imbalanced datasets and showed a significant difference performance especially in severely imbalanced datasets.

Manufacturing System of Centrifugal Cast Metal Bearing by Dehydrogenation (탈수소 열처리 공정에 의한 원심주조 메탈베어링의 제조 시스템)

  • Kim, Jeung-Hun;Kim, Chung-Gu;Byen, Jea-Young;Lee, Eun-Suk;Yang, Ji-Yung;Choi, Won-Sik
    • Journal of the Korean Society of Manufacturing Process Engineers
    • /
    • v.19 no.5
    • /
    • pp.111-117
    • /
    • 2020
  • Centrifugal casting is suitable for producing hollow-products using centrifugal force. Bush type metal bearings are the key parts that facilitate the rotational movement of various machinery. Metal bearings produced by conventional centrifugal casting machines show rotational imbalance. Therefore, after injecting a large amount of material, the product's precision is secured in the secondary processing. Rotational imbalance is caused by the force acting on the rotary disc plate. In order to minimize rotational imbalance, NASTRAN was used for the optimal design and structural analysis. It was concluded that the rotating plate of the conventional centrifugal casting machine should be prevented from tilting. For this purpose, the location & thickness of the stiffeners were obtained through the optimum design. In the conventional centrifugal casting machine, both ends of the product are lower in temperature than the center part, so internal stress occurs. This solves this problem by inserting a heating coil into the rotating plate.

A Study on the Types of the Displacement and Damage of Wooden Architectural Cultural Assets (목조건축문화재에 있어서 변위 및 손상 유형에 관한 연구)

  • Shin, Byeong-Uk
    • Journal of the Korean Institute of Rural Architecture
    • /
    • v.21 no.3
    • /
    • pp.25-32
    • /
    • 2019
  • This study is to derive the types of displacement and damage that occur in wooden architecture cultural assets. Although the wooden architectural cultural assets are being repaired through continuous maintenance, secondary problems frequently occur. This is because the root cause of the problem has yet to be solved. The types of displacement and damage that occur in the wooden architecture cultural asset are classified into three parts: the foundation section, the gagu section, and the roof section. In turn, the three main factors that lead to displacement and damages are the structures' load impact, the durability deterioration, and the imbalance. Load impact is a phenomenon in which the member is subjected to a load that causes deformation or cracks. Durability decline is a natural phenomenon that reduces the performance of lumber as a result of check shake, termite damage, and decay. The imbalance is a condition in which the lumber is twisted and the force balance is lost, due to either drying shrinkage or displacement of the gagu section.

Consensus-Based Distributed Algorithm for Optimal Resource Allocation of Power Network under Supply-Demand Imbalance (수급 불균형을 고려한 전력망의 최적 자원 할당을 위한 일치 기반의 분산 알고리즘)

  • Young-Hun, Lim
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.15 no.6
    • /
    • pp.440-448
    • /
    • 2022
  • Recently, due to the introduction of distributed energy resources, the optimal resource allocation problem of the power network is more and more important, and the distributed resource allocation method is required to process huge amount of data in large-scale power networks. In the optimal resource allocation problem, many studies have been conducted on the case when the supply-demand balance is satisfied due to the limitation of the generation capacity of each generator, but the studies considering the supply-demand imbalance, that total demand exceeds the maximum generation capacity, have rarely been considered. In this paper, we propose the consensus-based distributed algorithm for the optimal resource allocation of power network considering the supply-demand imbalance condition as well as the supply-demand balance condition. The proposed distributed algorithm is designed to allocate the optimal resources when the supply-demand balance condition is satisfied, and to measure the amount of required resources when the supply-demand is imbalanced. Finally, we conduct the simulations to verify the performance of the proposed algorithm.

Application and Comparison of Data Mining Technique to Prevent Metal-Bush Omission (메탈부쉬 누락예방을 위한 데이터마이닝 기법의 적용 및 비교)

  • Sang-Hyun Ko;Dongju Lee
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.3
    • /
    • pp.139-147
    • /
    • 2023
  • The metal bush assembling process is a process of inserting and compressing a metal bush that serves to reduce the occurrence of noise and stable compression in the rotating section. In the metal bush assembly process, the head diameter defect and placement defect of the metal bush occur due to metal bush omission, non-pressing, and poor press-fitting. Among these causes of defects, it is intended to prevent defects due to omission of the metal bush by using signals from sensors attached to the facility. In particular, a metal bush omission is predicted through various data mining techniques using left load cell value, right load cell value, current, and voltage as independent variables. In the case of metal bush omission defect, it is difficult to get defect data, resulting in data imbalance. Data imbalance refers to a case where there is a large difference in the number of data belonging to each class, which can be a problem when performing classification prediction. In order to solve the problem caused by data imbalance, oversampling and composite sampling techniques were applied in this study. In addition, simulated annealing was applied for optimization of parameters related to sampling and hyper-parameters of data mining techniques used for bush omission prediction. In this study, the metal bush omission was predicted using the actual data of M manufacturing company, and the classification performance was examined. All applied techniques showed excellent results, and in particular, the proposed methods, the method of mixing Random Forest and SA, and the method of mixing MLP and SA, showed better results.

Crack Detection Method for Tunnel Lining Surfaces using Ternary Classifier

  • Han, Jeong Hoon;Kim, In Soo;Lee, Cheol Hee;Moon, Young Shik
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.9
    • /
    • pp.3797-3822
    • /
    • 2020
  • The inspection of cracks on the surface of tunnel linings is a common method of evaluate the condition of the tunnel. In particular, determining the thickness and shape of a crack is important because it indicates the external forces applied to the tunnel and the current condition of the concrete structure. Recently, several automatic crack detection methods have been proposed to identify cracks using captured tunnel lining images. These methods apply an image-segmentation mechanism with well-annotated datasets. However, generating the ground truths requires many resources, and the small proportion of cracks in the images cause a class-imbalance problem. A weakly annotated dataset is generated to reduce resource consumption and avoid the class-imbalance problem. However, the use of the dataset results in a large number of false positives and requires post-processing for accurate crack detection. To overcome these issues, we propose a crack detection method using a ternary classifier. The proposed method significantly reduces the false positive rate, and the performance (as measured by the F1 score) is improved by 0.33 compared to previous methods. These results demonstrate the effectiveness of the proposed method.

The Characteristics and Perspectives of Industrial Technology Labor-force by Technology Intensities in Korean Manufacturing (기술집약도별 산업기술인력 수급구조의 특징과 정책적 시사점)

  • Hong, Seong-Min;Jang, Seon-Mi
    • Journal of Technology Innovation
    • /
    • v.16 no.2
    • /
    • pp.201-223
    • /
    • 2008
  • This paper studies the supply and demand of Industrial Technology Labor-force(ITL) and analyzes the determinate of ITL shortage in Korean manufacturing. We classified the industry into four categories-high technology industries, medium-high technology industries, medium-low technology industries and low technology industries-based on its R&D intensity like OECD. For the empirical analyses we use a survey data collected from 5,703 enterprises. The key findings are as follows: Firstly, a large majority of ITL is engaged in more technology-intensive industries but the categories that are exposed to more serious labor-force shortage problem are medium-high technology industries and low technology industries. Secondly, in the terms of supply factor, the ITL shortage problems are mainly due to the avoidance of ITL jobs. And the demand point, the reason is that the most of ITL are not researchers but production managers. Thirdly, the cause of imbalance between supply and demand of ITL are different by the technological categories. For example, in the high technology industries, the supply factors, such as average wage and turnover rate played more important role in the imbalance. But in the low technology industries the demand factors, such as per capita sales and the ratio of ITL in all employees were relatively much more important. Based on the findings, we discovered some political meanings such as the necessity to plan various policies to resolve the shortage problem of ITL according to the technological categories, etc.

  • PDF