• 제목/요약/키워드: Imbalance dataset

검색결과 58건 처리시간 0.021초

교대 근무자의 연령에 따른 건강 문제, 감정적 위험요인 노출, 일-생활 불균형, 근로환경 만족도 특성 분석 (Analysis of Working Conditions of Shift Workers by Age: Health Problems, Emotional Hazard Exposures, Work & Life Imbalance, and Satisfaction of Working Conditions)

  • 정이훈
    • 한국안전학회지
    • /
    • 제37권5호
    • /
    • pp.62-73
    • /
    • 2022
  • This study investigates the working conditions of shift workers according to age group by analyzing the sixth Korean Working Conditions Survey's data. A total of 1,323 shift workers were extracted from the dataset. Three age groups (A: 20s-30s, B: 40s-50s, C: 60s and above) were statistically compared in terms of health problems, emotional hazard exposure, work-life imbalance, and satisfaction with working conditions. Elderly shift workers (those in their 60s and above) had significantly more severe health problems and work-life imbalance, greater exposure to emotional hazards, and lower satisfaction with working conditions than young shift workers (those in their 20s-50s). The study's findings reveal the characteristics of working conditions for elderly shift workers and would be useful for improving shift workers' quality of life, as well as safety and productivity in the workplace.

Crack Detection Method for Tunnel Lining Surfaces using Ternary Classifier

  • Han, Jeong Hoon;Kim, In Soo;Lee, Cheol Hee;Moon, Young Shik
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제14권9호
    • /
    • pp.3797-3822
    • /
    • 2020
  • The inspection of cracks on the surface of tunnel linings is a common method of evaluate the condition of the tunnel. In particular, determining the thickness and shape of a crack is important because it indicates the external forces applied to the tunnel and the current condition of the concrete structure. Recently, several automatic crack detection methods have been proposed to identify cracks using captured tunnel lining images. These methods apply an image-segmentation mechanism with well-annotated datasets. However, generating the ground truths requires many resources, and the small proportion of cracks in the images cause a class-imbalance problem. A weakly annotated dataset is generated to reduce resource consumption and avoid the class-imbalance problem. However, the use of the dataset results in a large number of false positives and requires post-processing for accurate crack detection. To overcome these issues, we propose a crack detection method using a ternary classifier. The proposed method significantly reduces the false positive rate, and the performance (as measured by the F1 score) is improved by 0.33 compared to previous methods. These results demonstrate the effectiveness of the proposed method.

망막 영상 분석을 위한 두 갈래 분류기 (Two-Branch Classifier for Retinal Imaging Analysis)

  • 오영택;박현진
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국정보통신학회 2021년도 춘계학술대회
    • /
    • pp.614-616
    • /
    • 2021
  • 세계는 안구 질병 치료, 시력 회복 서비스, 훈련된 안과 전문의의 부족 등 안과 측면에서 어려움에 직면해 있다. 안구 병리를 조기에 발견하고 진단하면 시각 장애를 예방할 수 있다. 하지만 기존의 망막 영상 공개 데이터 세트는 임상에서 발견되는 다양한 질병으로 구성되어 있지 않기 때문에 다양한 안구 질환을 분류하는 방법을 개발하기가 어렵다. 본 연구는 2021 ISBI challenge에서 공개된 데이터 세트인 Retinal Fundus Multi-disease Image Dataset (RFMiD) 을 이용하여 안구 질환을 분류하는 방법을 제안한다. 본 연구의 목표는 망막 이미지를 정상, 비정상 범주로 선별하기 위한 강력하고 일반화 가능한 모델을 개발하는 것이다. 제안된 모델의 성능은 수신자 조작 특성 곡선 아래 면적 점수로 비공개 테스트 데이터 세트에 대해 0.9782의 값을 보여준다.

  • PDF

Centroid and Nearest Neighbor based Class Imbalance Reduction with Relevant Feature Selection using Ant Colony Optimization for Software Defect Prediction

  • B., Kiran Kumar;Gyani, Jayadev;Y., Bhavani;P., Ganesh Reddy;T, Nagasai Anjani Kumar
    • International Journal of Computer Science & Network Security
    • /
    • 제22권10호
    • /
    • pp.1-10
    • /
    • 2022
  • Nowadays software defect prediction (SDP) is most active research going on in software engineering. Early detection of defects lowers the cost of the software and also improves reliability. Machine learning techniques are widely used to create SDP models based on programming measures. The majority of defect prediction models in the literature have problems with class imbalance and high dimensionality. In this paper, we proposed Centroid and Nearest Neighbor based Class Imbalance Reduction (CNNCIR) technique that considers dataset distribution characteristics to generate symmetry between defective and non-defective records in imbalanced datasets. The proposed approach is compared with SMOTE (Synthetic Minority Oversampling Technique). The high-dimensionality problem is addressed using Ant Colony Optimization (ACO) technique by choosing relevant features. We used nine different classifiers to analyze six open-source software defect datasets from the PROMISE repository and seven performance measures are used to evaluate them. The results of the proposed CNNCIR method with ACO based feature selection reveals that it outperforms SMOTE in the majority of cases.

불균형 블랙박스 동영상 데이터에서 충돌 상황의 다중 분류를 위한 손실 함수 비교 (Comparison of Loss Function for Multi-Class Classification of Collision Events in Imbalanced Black-Box Video Data)

  • 이의상;한석민
    • 한국인터넷방송통신학회논문지
    • /
    • 제24권1호
    • /
    • pp.49-54
    • /
    • 2024
  • 데이터 불균형은 분류 문제에서 흔히 마주치는 문제로, 데이터셋 내의 클래스간 샘플 수의 현저한 차이에서 기인한다. 이러한 데이터 불균형은 일반적으로 분류 모델에서 과적합, 과소적합, 성능 지표의 오해 등의 문제를 야기한다. 이를 해결하기 위한 방법으로는 Resampling, Augmentation, 규제 기법, 손실 함수 조정 등이 있다. 본 논문에서는 손실 함수 조정에 대해 다루며 특히, 불균형 문제를 가진 Multi-Class 블랙박스 동영상 데이터에서 여러 구성의 손실 함수(Cross Entropy, Balanced Cross Entropy, 두 가지 Focal Loss 설정: 𝛼 = 1 및 𝛼 = Balanced, Asymmetric Loss)의 성능을 I3D, R3D_18 모델을 활용하여 비교하였다.

Heterogeneous Ensemble of Classifiers from Under-Sampled and Over-Sampled Data for Imbalanced Data

  • Kang, Dae-Ki;Han, Min-gyu
    • International journal of advanced smart convergence
    • /
    • 제8권1호
    • /
    • pp.75-81
    • /
    • 2019
  • Data imbalance problem is common and causes serious problem in machine learning process. Sampling is one of the effective methods for solving data imbalance problem. Over-sampling increases the number of instances, so when over-sampling is applied in imbalanced data, it is applied to minority instances. Under-sampling reduces instances, which usually is performed on majority data. We apply under-sampling and over-sampling to imbalanced data and generate sampled data sets. From the generated data sets from sampling and original data set, we construct a heterogeneous ensemble of classifiers. We apply five different algorithms to the heterogeneous ensemble. Experimental results on an intrusion detection dataset as an imbalanced datasets show that our approach shows effective results.

Securing SCADA Systems: A Comprehensive Machine Learning Approach for Detecting Reconnaissance Attacks

  • Ezaz Aldahasi;Talal Alkharobi
    • International Journal of Computer Science & Network Security
    • /
    • 제23권12호
    • /
    • pp.1-12
    • /
    • 2023
  • Ensuring the security of Supervisory Control and Data Acquisition (SCADA) and Industrial Control Systems (ICS) is paramount to safeguarding the reliability and safety of critical infrastructure. This paper addresses the significant threat posed by reconnaissance attacks on SCADA/ICS networks and presents an innovative methodology for enhancing their protection. The proposed approach strategically employs imbalance dataset handling techniques, ensemble methods, and feature engineering to enhance the resilience of SCADA/ICS systems. Experimentation and analysis demonstrate the compelling efficacy of our strategy, as evidenced by excellent model performance characterized by good precision, recall, and a commendably low false negative (FN). The practical utility of our approach is underscored through the evaluation of real-world SCADA/ICS datasets, showcasing superior performance compared to existing methods in a comparative analysis. Moreover, the integration of feature augmentation is revealed to significantly enhance detection capabilities. This research contributes to advancing the security posture of SCADA/ICS environments, addressing a critical imperative in the face of evolving cyber threats.

Enhancing E-commerce Security: A Comprehensive Approach to Real-Time Fraud Detection

  • Sara Alqethami;Badriah Almutanni;Walla Aleidarousr
    • International Journal of Computer Science & Network Security
    • /
    • 제24권4호
    • /
    • pp.1-10
    • /
    • 2024
  • In the era of big data, the growth of e-commerce transactions brings forth both opportunities and risks, including the threat of data theft and fraud. To address these challenges, an automated real-time fraud detection system leveraging machine learning was developed. Four algorithms (Decision Tree, Naïve Bayes, XGBoost, and Neural Network) underwent comparison using a dataset from a clothing website that encompassed both legitimate and fraudulent transactions. The dataset exhibited an imbalance, with 9.3% representing fraud and 90.07% legitimate transactions. Performance evaluation metrics, including Recall, Precision, F1 Score, and AUC ROC, were employed to assess the effectiveness of each algorithm. XGBoost emerged as the top-performing model, achieving an impressive accuracy score of 95.85%. The proposed system proves to be a robust defense mechanism against fraudulent activities in e-commerce, thereby enhancing security and instilling trust in online transactions.

딥러닝 기반 실내 디자인 인식 (Deep Learning-based Interior Design Recognition)

  • 이원규;박지훈;이종혁;정희철
    • 대한임베디드공학회논문지
    • /
    • 제19권1호
    • /
    • pp.47-55
    • /
    • 2024
  • We spend a lot of time in indoor space, and the space has a huge impact on our lives. Interior design plays a significant role to make an indoor space attractive and functional. However, it should consider a lot of complex elements such as color, pattern, and material etc. With the increasing demand for interior design, there is a growing need for technologies that analyze these design elements accurately and efficiently. To address this need, this study suggests a deep learning-based design analysis system. The proposed system consists of a semantic segmentation model that classifies spatial components and an image classification model that classifies attributes such as color, pattern, and material from the segmented components. Semantic segmentation model was trained using a dataset of 30000 personal indoor interior images collected for research, and during inference, the model separate the input image pixel into 34 categories. And experiments were conducted with various backbones in order to obtain the optimal performance of the deep learning model for the collected interior dataset. Finally, the model achieved good performance of 89.05% and 0.5768 in terms of accuracy and mean intersection over union (mIoU). In classification part convolutional neural network (CNN) model which has recorded high performance in other image recognition tasks was used. To improve the performance of the classification model we suggests an approach that how to handle data that has data imbalance and vulnerable to light intensity. Using our methods, we achieve satisfactory results in classifying interior design component attributes. In this paper, we propose indoor space design analysis system that automatically analyzes and classifies the attributes of indoor images using a deep learning-based model. This analysis system, used as a core module in the A.I interior recommendation service, can help users pursuing self-interior design to complete their designs more easily and efficiently.

Image-to-Image Translation with GAN for Synthetic Data Augmentation in Plant Disease Datasets

  • Nazki, Haseeb;Lee, Jaehwan;Yoon, Sook;Park, Dong Sun
    • 스마트미디어저널
    • /
    • 제8권2호
    • /
    • pp.46-57
    • /
    • 2019
  • In recent research, deep learning-based methods have achieved state-of-the-art performance in various computer vision tasks. However, these methods are commonly supervised, and require huge amounts of annotated data to train. Acquisition of data demands an additional costly effort, particularly for the tasks where it becomes challenging to obtain large amounts of data considering the time constraints and the requirement of professional human diligence. In this paper, we present a data level synthetic sampling solution to learn from small and imbalanced data sets using Generative Adversarial Networks (GANs). The reason for using GANs are the challenges posed in various fields to manage with the small datasets and fluctuating amounts of samples per class. As a result, we present an approach that can improve learning with respect to data distributions, reducing the partiality introduced by class imbalance and hence shifting the classification decision boundary towards more accurate results. Our novel method is demonstrated on a small dataset of 2789 tomato plant disease images, highly corrupted with class imbalance in 9 disease categories. Moreover, we evaluate our results in terms of different metrics and compare the quality of these results for distinct classes.