• Title/Summary/Keyword: Imbalance dataset

Search Result 58, Processing Time 0.02 seconds

Measurements of the Hepatectomy Rate and Regeneration Rate Using Deep Learning in CT Scan of Living Donors (딥러닝을 이용한 CT 영상에서 생체 공여자의 간 절제율 및 재생률 측정)

  • Sae Byeol, Mun;Young Jae, Kim;Won-Suk, Lee;Kwang Gi, Kim
    • Journal of Biomedical Engineering Research
    • /
    • v.43 no.6
    • /
    • pp.434-440
    • /
    • 2022
  • Liver transplantation is a critical used treatment method for patients with end-stage liver disease. The number of cases of living donor liver transplantation is increasing due to the imbalance in needs and supplies for brain-dead organ donation. As a result, the importance of the accuracy of the donor's suitability evaluation is also increasing rapidly. To measure the donor's liver volume accurately is the most important, that is absolutely necessary for the recipient's postoperative progress and the donor's safety. Therefore, we propose liver segmentation in abdominal CT images from pre-operation, POD 7, and POD 63 with a two-dimensional U-Net. In addition, we introduce an algorithm to measure the volume of the segmented liver and measure the hepatectomy rate and regeneration rate of pre-operation, POD 7, and POD 63. The performance for the learning model shows the best results in the images from pre-operation. Each dataset from pre-operation, POD 7, and POD 63 has the DSC of 94.55 ± 9.24%, 88.40 ± 18.01%, and 90.64 ± 14.35%. The mean of the measured liver volumes by trained model are 1423.44 ± 270.17 ml in pre-operation, 842.99 ± 190.95 ml in POD 7, and 1048.32 ± 201.02 ml in POD 63. The donor's hepatectomy rate is an average of 39.68 ± 13.06%, and the regeneration rate in POD 63 is an average of 14.78 ± 14.07%.

A hierarchical semantic segmentation framework for computer vision-based bridge damage detection

  • Jingxiao Liu;Yujie Wei ;Bingqing Chen;Hae Young Noh
    • Smart Structures and Systems
    • /
    • v.31 no.4
    • /
    • pp.325-334
    • /
    • 2023
  • Computer vision-based damage detection enables non-contact, efficient and low-cost bridge health monitoring, which reduces the need for labor-intensive manual inspection or that for a large number of on-site sensing instruments. By leveraging recent semantic segmentation approaches, we can detect regions of critical structural components and identify damages at pixel level on images. However, existing methods perform poorly when detecting small and thin damages (e.g., cracks); the problem is exacerbated by imbalanced samples. To this end, we incorporate domain knowledge to introduce a hierarchical semantic segmentation framework that imposes a hierarchical semantic relationship between component categories and damage types. For instance, certain types of concrete cracks are only present on bridge columns, and therefore the noncolumn region may be masked out when detecting such damages. In this way, the damage detection model focuses on extracting features from relevant structural components and avoid those from irrelevant regions. We also utilize multi-scale augmentation to preserve contextual information of each image, without losing the ability to handle small and/or thin damages. In addition, our framework employs an importance sampling, where images with rare components are sampled more often, to address sample imbalance. We evaluated our framework on a public synthetic dataset that consists of 2,000 railway bridges. Our framework achieves a 0.836 mean intersection over union (IoU) for structural component segmentation and a 0.483 mean IoU for damage segmentation. Our results have in total 5% and 18% improvements for the structural component segmentation and damage segmentation tasks, respectively, compared to the best-performing baseline model.

Effect Analysis of Data Imbalance for Emotion Recognition Based on Deep Learning (딥러닝기반 감정인식에서 데이터 불균형이 미치는 영향 분석)

  • Hajin Noh;Yujin Lim
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.12 no.8
    • /
    • pp.235-242
    • /
    • 2023
  • In recent years, as online counseling for infants and adolescents has increased, CNN-based deep learning models are widely used as assistance tools for emotion recognition. However, since most emotion recognition models are trained on mainly adult data, there are performance restrictions to apply the model to infants and adolescents. In this paper, in order to analyze the performance constraints, the characteristics of facial expressions for emotional recognition of infants and adolescents compared to adults are analyzed through LIME method, one of the XAI techniques. In addition, the experiments are performed on the male and female groups to analyze the characteristics of gender-specific facial expressions. As a result, we describe age-specific and gender-specific experimental results based on the data distribution of the pre-training dataset of CNN models and highlight the importance of balanced learning data.

Decision Tree Induction with Imbalanced Data Set: A Case of Health Insurance Bill Audit in a General Hospital (불균형 데이터 집합에서의 의사결정나무 추론: 종합 병원의 건강 보험료 청구 심사 사례)

  • Hur, Joon;Kim, Jong-Woo
    • Information Systems Review
    • /
    • v.9 no.1
    • /
    • pp.45-65
    • /
    • 2007
  • In medical industry, health insurance bill audit is unique and essential process in general hospitals. The health insurance bill audit process is very important because not only for hospital's profit but also hospital's reputation. Particularly, at the large general hospitals many related workers including analysts, nurses, and etc. have engaged in the health insurance bill audit process. This paper introduces a case of health insurance bill audit for finding reducible health insurance bill cases using decision tree induction techniques at a large general hospital in Korea. When supervised learning methods had been tried to be applied, one of major problems was data imbalance problem in the health insurance bill audit data. In other words, there were many normal(passing) cases and relatively small number of reduction cases in a bill audit dataset. To resolve the problem, in this study, well-known methods for imbalanced data sets including over sampling of rare cases, under sampling of major cases, and adjusting the misclassification cost are combined in several ways to find appropriate decision trees that satisfy required conditions in health insurance bill audit situation.

Sentiment Analysis of Product Reviews to Identify Deceptive Rating Information in Social Media: A SentiDeceptive Approach

  • Marwat, M. Irfan;Khan, Javed Ali;Alshehri, Dr. Mohammad Dahman;Ali, Muhammad Asghar;Hizbullah;Ali, Haider;Assam, Muhammad
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.3
    • /
    • pp.830-860
    • /
    • 2022
  • [Introduction] Nowadays, many companies are shifting their businesses online due to the growing trend among customers to buy and shop online, as people prefer online purchasing products. [Problem] Users share a vast amount of information about products, making it difficult and challenging for the end-users to make certain decisions. [Motivation] Therefore, we need a mechanism to automatically analyze end-user opinions, thoughts, or feelings in the social media platform about the products that might be useful for the customers to make or change their decisions about buying or purchasing specific products. [Proposed Solution] For this purpose, we proposed an automated SentiDecpective approach, which classifies end-user reviews into negative, positive, and neutral sentiments and identifies deceptive crowd-users rating information in the social media platform to help the user in decision-making. [Methodology] For this purpose, we first collected 11781 end-users comments from the Amazon store and Flipkart web application covering distant products, such as watches, mobile, shoes, clothes, and perfumes. Next, we develop a coding guideline used as a base for the comments annotation process. We then applied the content analysis approach and existing VADER library to annotate the end-user comments in the data set with the identified codes, which results in a labelled data set used as an input to the machine learning classifiers. Finally, we applied the sentiment analysis approach to identify the end-users opinions and overcome the deceptive rating information in the social media platforms by first preprocessing the input data to remove the irrelevant (stop words, special characters, etc.) data from the dataset, employing two standard resampling approaches to balance the data set, i-e, oversampling, and under-sampling, extract different features (TF-IDF and BOW) from the textual data in the data set and then train & test the machine learning algorithms by applying a standard cross-validation approach (KFold and Shuffle Split). [Results/Outcomes] Furthermore, to support our research study, we developed an automated tool that automatically analyzes each customer feedback and displays the collective sentiments of customers about a specific product with the help of a graph, which helps customers to make certain decisions. In a nutshell, our proposed sentiments approach produces good results when identifying the customer sentiments from the online user feedbacks, i-e, obtained an average 94.01% precision, 93.69% recall, and 93.81% F-measure value for classifying positive sentiments.

The Performance Improvement of U-Net Model for Landcover Semantic Segmentation through Data Augmentation (데이터 확장을 통한 토지피복분류 U-Net 모델의 성능 개선)

  • Baek, Won-Kyung;Lee, Moung-Jin;Jung, Hyung-Sup
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_2
    • /
    • pp.1663-1676
    • /
    • 2022
  • Recently, a number of deep-learning based land cover segmentation studies have been introduced. Some studies denoted that the performance of land cover segmentation deteriorated due to insufficient training data. In this study, we verified the improvement of land cover segmentation performance through data augmentation. U-Net was implemented for the segmentation model. And 2020 satellite-derived landcover dataset was utilized for the study data. The pixel accuracies were 0.905 and 0.923 for U-Net trained by original and augmented data respectively. And the mean F1 scores of those models were 0.720 and 0.775 respectively, indicating the better performance of data augmentation. In addition, F1 scores for building, road, paddy field, upland field, forest, and unclassified area class were 0.770, 0.568, 0.433, 0.455, 0.964, and 0.830 for the U-Net trained by original data. It is verified that data augmentation is effective in that the F1 scores of every class were improved to 0.838, 0.660, 0.791, 0.530, 0.969, and 0.860 respectively. Although, we applied data augmentation without considering class balances, we find that data augmentation can mitigate biased segmentation performance caused by data imbalance problems from the comparisons between the performances of two models. It is expected that this study would help to prove the importance and effectiveness of data augmentation in various image processing fields.

Comparative Study of Anomaly Detection Accuracy of Intrusion Detection Systems Based on Various Data Preprocessing Techniques (다양한 데이터 전처리 기법 기반 침입탐지 시스템의 이상탐지 정확도 비교 연구)

  • Park, Kyungseon;Kim, Kangseok
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.11
    • /
    • pp.449-456
    • /
    • 2021
  • An intrusion detection system is a technology that detects abnormal behaviors that violate security, and detects abnormal operations and prevents system attacks. Existing intrusion detection systems have been designed using statistical analysis or anomaly detection techniques for traffic patterns, but modern systems generate a variety of traffic different from existing systems due to rapidly growing technologies, so the existing methods have limitations. In order to overcome this limitation, study on intrusion detection methods applying various machine learning techniques is being actively conducted. In this study, a comparative study was conducted on data preprocessing techniques that can improve the accuracy of anomaly detection using NGIDS-DS (Next Generation IDS Database) generated by simulation equipment for traffic in various network environments. Padding and sliding window were used as data preprocessing, and an oversampling technique with Adversarial Auto-Encoder (AAE) was applied to solve the problem of imbalance between the normal data rate and the abnormal data rate. In addition, the performance improvement of detection accuracy was confirmed by using Skip-gram among the Word2Vec techniques that can extract feature vectors of preprocessed sequence data. PCA-SVM and GRU were used as models for comparative experiments, and the experimental results showed better performance when sliding window, skip-gram, AAE, and GRU were applied.

Sorghum Field Segmentation with U-Net from UAV RGB (무인기 기반 RGB 영상 활용 U-Net을 이용한 수수 재배지 분할)

  • Kisu Park;Chanseok Ryu ;Yeseong Kang;Eunri Kim;Jongchan Jeong;Jinki Park
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_1
    • /
    • pp.521-535
    • /
    • 2023
  • When converting rice fields into fields,sorghum (sorghum bicolor L. Moench) has excellent moisture resistance, enabling stable production along with soybeans. Therefore, it is a crop that is expected to improve the self-sufficiency rate of domestic food crops and solve the rice supply-demand imbalance problem. However, there is a lack of fundamental statistics,such as cultivation fields required for estimating yields, due to the traditional survey method, which takes a long time even with a large manpower. In this study, U-Net was applied to RGB images based on unmanned aerial vehicle to confirm the possibility of non-destructive segmentation of sorghum cultivation fields. RGB images were acquired on July 28, August 13, and August 25, 2022. On each image acquisition date, datasets were divided into 6,000 training datasets and 1,000 validation datasets with a size of 512 × 512 images. Classification models were developed based on three classes consisting of Sorghum fields(sorghum), rice and soybean fields(others), and non-agricultural fields(background), and two classes consisting of sorghum and non-sorghum (others+background). The classification accuracy of sorghum cultivation fields was higher than 0.91 in the three class-based models at all acquisition dates, but learning confusion occurred in the other classes in the August dataset. In contrast, the two-class-based model showed an accuracy of 0.95 or better in all classes, with stable learning on the August dataset. As a result, two class-based models in August will be advantageous for calculating the cultivation fields of sorghum.