• 제목/요약/키워드: dataset

검색결과 4,026건 처리시간 0.029초

위상 최적화를 위한 생산적 적대 신경망 기반 데이터 증강 기법 (GAN-based Data Augmentation methods for Topology Optimization)

  • 이승혜;이유진;이기학;이재홍
    • 한국공간구조학회논문집
    • /
    • 제21권4호
    • /
    • pp.39-48
    • /
    • 2021
  • In this paper, a GAN-based data augmentation method is proposed for topology optimization. In machine learning techniques, a total amount of dataset determines the accuracy and robustness of the trained neural network architectures, especially, supervised learning networks. Because the insufficient data tends to lead to overfitting or underfitting of the architectures, a data augmentation method is need to increase the amount of data for reducing overfitting when training a machine learning model. In this study, the Ganerative Adversarial Network (GAN) is used to augment the topology optimization dataset. The produced dataset has been compared with the original dataset.

Semantic Segmentation of Heterogeneous Unmanned Aerial Vehicle Datasets Using Combined Segmentation Network

  • Ahram, Song
    • 대한원격탐사학회지
    • /
    • 제39권1호
    • /
    • pp.87-97
    • /
    • 2023
  • Unmanned aerial vehicles (UAVs) can capture high-resolution imagery from a variety of viewing angles and altitudes; they are generally limited to collecting images of small scenes from larger regions. To improve the utility of UAV-appropriated datasetsfor use with deep learning applications, multiple datasets created from variousregions under different conditions are needed. To demonstrate a powerful new method for integrating heterogeneous UAV datasets, this paper applies a combined segmentation network (CSN) to share UAVid and semantic drone dataset encoding blocks to learn their general features, whereas its decoding blocks are trained separately on each dataset. Experimental results show that our CSN improves the accuracy of specific classes (e.g., cars), which currently comprise a low ratio in both datasets. From this result, it is expected that the range of UAV dataset utilization will increase.

Performance of Random Forest Classifier for Flood Mapping Using Sentinel-1 SAR Images

  • Chu, Yongjae;Lee, Hoonyol
    • 대한원격탐사학회지
    • /
    • 제38권4호
    • /
    • pp.375-386
    • /
    • 2022
  • The city of Khartoum, the capital of Sudan, was heavily damaged by the flood of the Nile in 2020. Classification using satellite images can define the damaged area and help emergency response. As Synthetic Aperture Radar (SAR) uses microwave that can penetrate cloud, it is suitable to use in the flood study. In this study, Random Forest classifier, one of the supervised classification algorithms, was applied to the flood event in Khartoum with various sizes of the training dataset and number of images using Sentinel-1 SAR. To create a training dataset, we used unsupervised classification and visual inspection. Firstly, Random Forest was performed by reducing the size of each class of the training dataset, but no notable difference was found. Next, we performed Random Forest with various number of images. Accuracy became better as the number of images in creased, but converged to a maximum value when the dataset covers the duration from flood to the completion of drainage.

Classification for Imbalanced Breast Cancer Dataset Using Resampling Methods

  • Hana Babiker, Nassar
    • International Journal of Computer Science & Network Security
    • /
    • 제23권1호
    • /
    • pp.89-95
    • /
    • 2023
  • Analyzing breast cancer patient files is becoming an exciting area of medical information analysis, especially with the increasing number of patient files. In this paper, breast cancer data is collected from Khartoum state hospital, and the dataset is classified into recurrence and no recurrence. The data is imbalanced, meaning that one of the two classes have more sample than the other. Many pre-processing techniques are applied to classify this imbalanced data, resampling, attribute selection, and handling missing values, and then different classifiers models are built. In the first experiment, five classifiers (ANN, REP TREE, SVM, and J48) are used, and in the second experiment, meta-learning algorithms (Bagging, Boosting, and Random subspace). Finally, the ensemble model is used. The best result was obtained from the ensemble model (Boosting with J48) with the highest accuracy 95.2797% among all the algorithms, followed by Bagging with J48(90.559%) and random subspace with J48(84.2657%). The breast cancer imbalanced dataset was classified into recurrence, and no recurrence with different classified algorithms and the best result was obtained from the ensemble model.

Classification of Network Traffic using Machine Learning for Software Defined Networks

  • Muhammad Shahzad Haroon;Husnain Mansoor
    • International Journal of Computer Science & Network Security
    • /
    • 제23권12호
    • /
    • pp.91-100
    • /
    • 2023
  • As SDN devices and systems hit the market, security in SDN must be raised on the agenda. SDN has become an interesting area in both academics and industry. SDN promises many benefits which attract many IT managers and Leading IT companies which motivates them to switch to SDN. Over the last three decades, network attacks becoming more sophisticated and complex to detect. The goal is to study how traffic information can be extracted from an SDN controller and open virtual switches (OVS) using SDN mechanisms. The testbed environment is created using the RYU controller and Mininet. The extracted information is further used to detect these attacks efficiently using a machine learning approach. To use the Machine learning approach, a dataset is required. Currently, a public SDN based dataset is not available. In this paper, SDN based dataset is created which include legitimate and non-legitimate traffic. Classification is divided into two categories: binary and multiclass classification. Traffic has been classified with or without dimension reduction techniques like PCA and LDA. Our approach provides 98.58% of accuracy using a random forest algorithm.

KMSAV: Korean multi-speaker spontaneous audiovisual dataset

  • Kiyoung Park;Changhan Oh;Sunghee Dong
    • ETRI Journal
    • /
    • 제46권1호
    • /
    • pp.71-81
    • /
    • 2024
  • Recent advances in deep learning for speech and visual recognition have accelerated the development of multimodal speech recognition, yielding many innovative results. We introduce a Korean audiovisual speech recognition corpus. This dataset comprises approximately 150 h of manually transcribed and annotated audiovisual data supplemented with additional 2000 h of untranscribed videos collected from YouTube under the Creative Commons License. The dataset is intended to be freely accessible for unrestricted research purposes. Along with the corpus, we propose an open-source framework for automatic speech recognition (ASR) and audiovisual speech recognition (AVSR). We validate the effectiveness of the corpus with evaluations using state-of-the-art ASR and AVSR techniques, capitalizing on both pretrained models and fine-tuning processes. After fine-tuning, ASR and AVSR achieve character error rates of 11.1% and 18.9%, respectively. This error difference highlights the need for improvement in AVSR techniques. We expect that our corpus will be an instrumental resource to support improvements in AVSR.

머신러닝 기반의 온실 제어를 위한 예측모델 개발 (Development of Prediction Model for Greenhouse Control based on Machine Learning)

  • 김상엽;박경섭;이상민;허병문;류근호
    • 디지털콘텐츠학회 논문지
    • /
    • 제19권4호
    • /
    • pp.749-756
    • /
    • 2018
  • 본 연구는 머신러닝 기법을 이용한 온실 제어를 위한 예측모델을 개발하는 것이 목적이다. 시설원예연구소의 실험온실에서 측정된 데이터(2016년)를 사용하여 예측모델을 개발하였다. 모델의 예측성능 향상과 데이터의 신뢰성 확보를 위해 상관관계분석을 통해 데이터의 축소를 수행하였다. 데이터는 계절별 특성을 고려하여 봄, 여름, 가을 및 겨울로 나누어 구축하였다. 머신러닝 기반의 예측모델로 인공신경망, 순환신경망 및 다중회귀모델을 구축하고 비교분석을 통해 타당성을 평가하였다. 분석 결과에서, Selected dataset에서는 인공신경망 모델이 Full dataset에서는 다중회귀모델이 좋은 예측성능을 보였다.

Feasibility of fully automated classification of whole slide images based on deep learning

  • Cho, Kyung-Ok;Lee, Sung Hak;Jang, Hyun-Jong
    • The Korean Journal of Physiology and Pharmacology
    • /
    • 제24권1호
    • /
    • pp.89-99
    • /
    • 2020
  • Although microscopic analysis of tissue slides has been the basis for disease diagnosis for decades, intra- and inter-observer variabilities remain issues to be resolved. The recent introduction of digital scanners has allowed for using deep learning in the analysis of tissue images because many whole slide images (WSIs) are accessible to researchers. In the present study, we investigated the possibility of a deep learning-based, fully automated, computer-aided diagnosis system with WSIs from a stomach adenocarcinoma dataset. Three different convolutional neural network architectures were tested to determine the better architecture for tissue classifier. Each network was trained to classify small tissue patches into normal or tumor. Based on the patch-level classification, tumor probability heatmaps can be overlaid on tissue images. We observed three different tissue patterns, including clear normal, clear tumor and ambiguous cases. We suggest that longer inspection time can be assigned to ambiguous cases compared to clear normal cases, increasing the accuracy and efficiency of histopathologic diagnosis by pre-evaluating the status of the WSIs. When the classifier was tested with completely different WSI dataset, the performance was not optimal because of the different tissue preparation quality. By including a small amount of data from the new dataset for training, the performance for the new dataset was much enhanced. These results indicated that WSI dataset should include tissues prepared from many different preparation conditions to construct a generalized tissue classifier. Thus, multi-national/multi-center dataset should be built for the application of deep learning in the real world medical practice.

통신 환경에서 비정형적 구조를 갖는 데이터세트의 효과적인 제어 방법 (An Effective Control Scheme for Unstructued Dataset in the Communication Environments)

  • 배명남;최완;이동춘
    • 정보처리학회논문지C
    • /
    • 제9C권1호
    • /
    • pp.31-38
    • /
    • 2002
  • 교환기 시스템(Switching System)과 같은 통신 시스템에서는 제안된 이벤트들이 반드시 명시된 시간 제약 내에 완료되어야 한다. 따라서, 시스템에 유지되는 응용 데이터들은 빠른 접근이 가능해야 하며, 동시에 제한된 시간 내에 이벤트의 완료가 보장되어야 한다. 현재, 많은 데이터 시스템들이 사용되고 있지만, 이들은 정형화된 구조와 이에 대한 기본적인 연산들만을 제공하고 있다. 최근 통신 응용에서 데이터의 복잡성이 증가함으로서, 기존의 방식과는 달리, 비정형화된 구조의 표현이 가능하며, 이들에 대해 쉽게 접근 가능한 체계가 요구되고 있다. 이를 위해, 본 논문에서는 비정형화된 다중 응용 환경의 모델링에 적합한 데이터 모델을 소개한다. 모델은 데이터세트에 대한 빠른 접근과 필요한 데이터를 쉽게 추출할 수 있는 체계를 제공한다 추가로, 모델의 특징을 명확히 하기 위해 몇몇 세부 알고리즘을 함께 설명한다.

딥러닝 기반의 한글 폰트 연구를 위한 한글 폰트 데이터셋 (Hangul Font Dataset for Korean Font Research Based on Deep Learning)

  • 고홍희;이현수;석정재;;최재영
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제10권2호
    • /
    • pp.73-78
    • /
    • 2021
  • 최근 딥러닝에 대한 관심이 증가하면서 이를 이용한 다양한 분야에서 연구가 진행되고 있다. 그러나 딥러닝 기반의 생성 모델을 이용하는 폰트의 자동 생성 연구들은 로마자 및 한자와 같은 몇 언어들에 국한되어 연구되고 있다. 한글 폰트 디자인은 매우 큰 시간과 비용이 들어가는 작업으로, 딥러닝을 이용하면 손쉽게 생성할 수 있다. 한글 폰트를 생성하는 연구는 딥러닝 기반의 생성 모델들과 발맞추기 위해 프로세스 자동화 관점에서 한글 폰트 데이터셋을 준비하는 것이 중요하다. 이를 위하여 본 논문에서는 딥러닝 기반의 한글 폰트 연구를 위한 한글 폰트 데이터셋을 제안하고. 그 데이터셋을 구성하는 방법을 기술한다. 본 논문에서 제안하는 한글 폰트 데이터셋을 기반으로 딥러닝 한글 폰트 생성 어플리케이션에 적용하는 과정을 통해 제안하는 데이터셋 구성의 유용성을 보인다.