• Title/Summary/Keyword: Dataset Augmentation

Search Result 109, Processing Time 0.029 seconds

Classification of Leukemia Disease in Peripheral Blood Cell Images Using Convolutional Neural Network

  • Tran, Thanh;Park, Jin-Hyuk;Kwon, Oh-Heum;Moon, Kwang-Seok;Lee, Suk-Hwan;Kwon, Ki-Ryong
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.10
    • /
    • pp.1150-1161
    • /
    • 2018
  • Classification is widely used in medical images to categorize patients and non-patients. However, conventional classification requires a complex procedure, including some rigid steps such as pre-processing, segmentation, feature extraction, detection, and classification. In this paper, we propose a novel convolutional neural network (CNN), called LeukemiaNet, to specifically classify two different types of leukemia, including acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML), and non-cancerous patients. To extend the limited dataset, a PCA color augmentation process is utilized before images are input into the LeukemiaNet. This augmentation method enhances the accuracy of our proposed CNN architecture from 96.9% to 97.2% for distinguishing ALL, AML, and normal cell images.

Robust Head Pose Estimation for Masked Face Image via Data Augmentation (데이터 증강을 통한 마스크 착용 얼굴 이미지에 강인한 얼굴 자세추정)

  • Kyeongtak, Han;Sungeun, Hong
    • Journal of Broadcast Engineering
    • /
    • v.27 no.6
    • /
    • pp.944-947
    • /
    • 2022
  • Due to the coronavirus pandemic, the wearing of a mask has been increasing worldwide; thus, the importance of image analysis on masked face images has become essential. Although head pose estimation can be applied to various face-related applications including driver attention, face frontalization, and gaze detection, few studies have been conducted to address the performance degradation caused by masked faces. This study proposes a new data augmentation that synthesizes the masked face, depending on the face image size and poses, which shows robust performance on BIWI benchmark dataset regardless of mask-wearing. Since the proposed scheme is not limited to the specific model, it can be utilized in various head pose estimation models.

Steel Surface Defect Detection using the RetinaNet Detection Model

  • Sharma, Mansi;Lim, Jong-Tae;Chae, Yi-Geun
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.14 no.2
    • /
    • pp.136-146
    • /
    • 2022
  • Some surface defects make the weak quality of steel materials. To limit these defects, we advocate a one-stage detector model RetinaNet among diverse detection algorithms in deep learning. There are several backbones in the RetinaNet model. We acknowledged two backbones, which are ResNet50 and VGG19. To validate our model, we compared and analyzed several traditional models, one-stage models like YOLO and SSD models and two-stage models like Faster-RCNN, EDDN, and Xception models, with simulations based on steel individual classes. We also performed the correlation of the time factor between one-stage and two-stage models. Comparative analysis shows that the proposed model achieves excellent results on the dataset of the Northeastern University surface defect detection dataset. We would like to work on different backbones to check the efficiency of the model for real world, increasing the datasets through augmentation and focus on improving our limitation.

Calibration for Gingivitis Binary Classifier via Epoch-wise Decaying Label-Smoothing (라벨 스무딩을 활용한 치은염 이진 분류기 캘리브레이션)

  • Lee, Sanghyun
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.594-596
    • /
    • 2021
  • Future healthcare systems will heavily rely on ill-labeled data due to scarcity of the experts who are trained enough to label the data. Considering the contamination of the dataset, it is not desirable to make the neural network being overconfident to the dataset, but rather giving them some margins for the prediction is preferable. In this paper, we propose a novel epoch-wise decaying label-smoothing function to alleviate the model over-confidency, and it outperforms the neural network trained with conventional cross entropy by 6.0%.

  • PDF

Detection of Anomaly Lung Sound using Deep Temporal Feature Extraction (깊은 시계열 특성 추출을 이용한 폐 음성 이상 탐지)

  • Kim-Ngoc T. Le;Gyurin Byun;Hyunseung Choo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.605-607
    • /
    • 2023
  • Recent research has highlighted the effectiveness of Deep Learning (DL) techniques in automating the detection of lung sound anomalies. However, the available lung sound datasets often suffer from limitations in both size and balance, prompting DL methods to employ data preprocessing such as augmentation and transfer learning techniques. These strategies, while valuable, contribute to the increased complexity of DL models and necessitate substantial training memory. In this study, we proposed a streamlined and lightweight DL method but effectively detects lung sound anomalies from small and imbalanced dataset. The utilization of 1D dilated convolutional neural networks enhances sensitivity to lung sound anomalies by efficiently capturing deep temporal features and small variations. We conducted a comprehensive evaluation of the ICBHI dataset and achieved a notable improvement over state-of-the-art results, increasing the average score of sensitivity and specificity metrics by 2.7%.

Securing SCADA Systems: A Comprehensive Machine Learning Approach for Detecting Reconnaissance Attacks

  • Ezaz Aldahasi;Talal Alkharobi
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.12
    • /
    • pp.1-12
    • /
    • 2023
  • Ensuring the security of Supervisory Control and Data Acquisition (SCADA) and Industrial Control Systems (ICS) is paramount to safeguarding the reliability and safety of critical infrastructure. This paper addresses the significant threat posed by reconnaissance attacks on SCADA/ICS networks and presents an innovative methodology for enhancing their protection. The proposed approach strategically employs imbalance dataset handling techniques, ensemble methods, and feature engineering to enhance the resilience of SCADA/ICS systems. Experimentation and analysis demonstrate the compelling efficacy of our strategy, as evidenced by excellent model performance characterized by good precision, recall, and a commendably low false negative (FN). The practical utility of our approach is underscored through the evaluation of real-world SCADA/ICS datasets, showcasing superior performance compared to existing methods in a comparative analysis. Moreover, the integration of feature augmentation is revealed to significantly enhance detection capabilities. This research contributes to advancing the security posture of SCADA/ICS environments, addressing a critical imperative in the face of evolving cyber threats.

CNN-based Weighted Ensemble Technique for ImageNet Classification (대용량 이미지넷 인식을 위한 CNN 기반 Weighted 앙상블 기법)

  • Jung, Heechul;Choi, Min-Kook;Kim, Junkwang;Kwon, Soon;Jung, Wooyoung
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.15 no.4
    • /
    • pp.197-204
    • /
    • 2020
  • The ImageNet dataset is a large scale dataset and contains various natural scene images. In this paper, we propose a convolutional neural network (CNN)-based weighted ensemble technique for the ImageNet classification task. First, in order to fuse several models, our technique uses weights for each model, unlike the existing average-based ensemble technique. Then we propose an algorithm that automatically finds the coefficients used in later ensemble process. Our algorithm sequentially selects the model with the best performance of the validation set, and then obtains a weight that improves performance when combined with existing selected models. We applied the proposed algorithm to a total of 13 heterogeneous models, and as a result, 5 models were selected. These selected models were combined with weights, and we achieved 3.297% Top-5 error rate on the ImageNet test dataset.

A Study on Improving Performance of Software Requirements Classification Models by Handling Imbalanced Data (불균형 데이터 처리를 통한 소프트웨어 요구사항 분류 모델의 성능 개선에 관한 연구)

  • Jong-Woo Choi;Young-Jun Lee;Chae-Gyun Lim;Ho-Jin Choi
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.7
    • /
    • pp.295-302
    • /
    • 2023
  • Software requirements written in natural language may have different meanings from the stakeholders' viewpoint. When designing an architecture based on quality attributes, it is necessary to accurately classify quality attribute requirements because the efficient design is possible only when appropriate architectural tactics for each quality attribute are selected. As a result, although many natural language processing models have been studied for the classification of requirements, which is a high-cost task, few topics improve classification performance with the imbalanced quality attribute datasets. In this study, we first show that the classification model can automatically classify the Korean requirement dataset through experiments. Based on these results, we explain that data augmentation through EDA(Easy Data Augmentation) techniques and undersampling strategies can improve the imbalance of quality attribute datasets, and show that they are effective in classifying requirements. The results improved by 5.24%p on F1-score, indicating that handling imbalanced data helps classify Korean requirements of classification models. Furthermore, detailed experiments of EDA illustrate operations that help improve classification performance.

The Performance Improvement of U-Net Model for Landcover Semantic Segmentation through Data Augmentation (데이터 확장을 통한 토지피복분류 U-Net 모델의 성능 개선)

  • Baek, Won-Kyung;Lee, Moung-Jin;Jung, Hyung-Sup
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_2
    • /
    • pp.1663-1676
    • /
    • 2022
  • Recently, a number of deep-learning based land cover segmentation studies have been introduced. Some studies denoted that the performance of land cover segmentation deteriorated due to insufficient training data. In this study, we verified the improvement of land cover segmentation performance through data augmentation. U-Net was implemented for the segmentation model. And 2020 satellite-derived landcover dataset was utilized for the study data. The pixel accuracies were 0.905 and 0.923 for U-Net trained by original and augmented data respectively. And the mean F1 scores of those models were 0.720 and 0.775 respectively, indicating the better performance of data augmentation. In addition, F1 scores for building, road, paddy field, upland field, forest, and unclassified area class were 0.770, 0.568, 0.433, 0.455, 0.964, and 0.830 for the U-Net trained by original data. It is verified that data augmentation is effective in that the F1 scores of every class were improved to 0.838, 0.660, 0.791, 0.530, 0.969, and 0.860 respectively. Although, we applied data augmentation without considering class balances, we find that data augmentation can mitigate biased segmentation performance caused by data imbalance problems from the comparisons between the performances of two models. It is expected that this study would help to prove the importance and effectiveness of data augmentation in various image processing fields.

Remote Sensing Image Classification for Land Cover Mapping in Developing Countries: A Novel Deep Learning Approach

  • Lynda, Nzurumike Obianuju;Nnanna, Nwojo Agwu;Boukar, Moussa Mahamat
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.2
    • /
    • pp.214-222
    • /
    • 2022
  • Convolutional Neural networks (CNNs) are a category of deep learning networks that have proven very effective in computer vision tasks such as image classification. Notwithstanding, not much has been seen in its use for remote sensing image classification in developing countries. This is majorly due to the scarcity of training data. Recently, transfer learning technique has successfully been used to develop state-of-the art models for remote sensing (RS) image classification tasks using training and testing data from well-known RS data repositories. However, the ability of such model to classify RS test data from a different dataset has not been sufficiently investigated. In this paper, we propose a deep CNN model that can classify RS test data from a dataset different from the training dataset. To achieve our objective, we first, re-trained a ResNet-50 model using EuroSAT, a large-scale RS dataset to develop a base model then we integrated Augmentation and Ensemble learning to improve its generalization ability. We further experimented on the ability of this model to classify a novel dataset (Nig_Images). The final classification results shows that our model achieves a 96% and 80% accuracy on EuroSAT and Nig_Images test data respectively. Adequate knowledge and usage of this framework is expected to encourage research and the usage of deep CNNs for land cover mapping in cases of lack of training data as obtainable in developing countries.