• 제목/요약/키워드: Training Datasets

검색결과 340건 처리시간 0.021초

Performance Analysis of Cloud-Net with Cross-sensor Training Dataset for Satellite Image-based Cloud Detection

  • Kim, Mi-Jeong;Ko, Yun-Ho
    • 대한원격탐사학회지
    • /
    • 제38권1호
    • /
    • pp.103-110
    • /
    • 2022
  • Since satellite images generally include clouds in the atmosphere, it is essential to detect or mask clouds before satellite image processing. Clouds were detected using physical characteristics of clouds in previous research. Cloud detection methods using deep learning techniques such as CNN or the modified U-Net in image segmentation field have been studied recently. Since image segmentation is the process of assigning a label to every pixel in an image, precise pixel-based dataset is required for cloud detection. Obtaining accurate training datasets is more important than a network configuration in image segmentation for cloud detection. Existing deep learning techniques used different training datasets. And test datasets were extracted from intra-dataset which were acquired by same sensor and procedure as training dataset. Different datasets make it difficult to determine which network shows a better overall performance. To verify the effectiveness of the cloud detection network such as Cloud-Net, two types of networks were trained using the cloud dataset from KOMPSAT-3 images provided by the AIHUB site and the L8-Cloud dataset from Landsat8 images which was publicly opened by a Cloud-Net author. Test data from intra-dataset of KOMPSAT-3 cloud dataset were used for validating the network. The simulation results show that the network trained with KOMPSAT-3 cloud dataset shows good performance on the network trained with L8-Cloud dataset. Because Landsat8 and KOMPSAT-3 satellite images have different GSDs, making it difficult to achieve good results from cross-sensor validation. The network could be superior for intra-dataset, but it could be inferior for cross-sensor data. It is necessary to study techniques that show good results in cross-senor validation dataset in the future.

Tri-training algorithm based on cross entropy and K-nearest neighbors for network intrusion detection

  • Zhao, Jia;Li, Song;Wu, Runxiu;Zhang, Yiying;Zhang, Bo;Han, Longzhe
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권12호
    • /
    • pp.3889-3903
    • /
    • 2022
  • To address the problem of low detection accuracy due to training noise caused by mislabeling when Tri-training for network intrusion detection (NID), we propose a Tri-training algorithm based on cross entropy and K-nearest neighbors (TCK) for network intrusion detection. The proposed algorithm uses cross-entropy to replace the classification error rate to better identify the difference between the practical and predicted distributions of the model and reduce the prediction bias of mislabeled data to unlabeled data; K-nearest neighbors are used to remove the mislabeled data and reduce the number of mislabeled data. In order to verify the effectiveness of the algorithm proposed in this paper, experiments were conducted on 12 UCI datasets and NSL-KDD network intrusion datasets, and four indexes including accuracy, recall, F-measure and precision were used for comparison. The experimental results revealed that the TCK has superior performance than the conventional Tri-training algorithms and the Tri-training algorithms using only cross-entropy or K-nearest neighbor strategy.

DeepCleanNet: Training Deep Convolutional Neural Network with Extremely Noisy Labels

  • Olimov, Bekhzod;Kim, Jeonghong
    • 한국멀티미디어학회논문지
    • /
    • 제23권11호
    • /
    • pp.1349-1360
    • /
    • 2020
  • In recent years, Convolutional Neural Networks (CNNs) have been successfully implemented in different tasks of computer vision. Since CNN models are the representatives of supervised learning algorithms, they demand large amount of data in order to train the classifiers. Thus, obtaining data with correct labels is imperative to attain the state-of-the-art performance of the CNN models. However, labelling datasets is quite tedious and expensive process, therefore real-life datasets often exhibit incorrect labels. Although the issue of poorly labelled datasets has been studied before, we have noticed that the methods are very complex and hard to reproduce. Therefore, in this research work, we propose Deep CleanNet - a considerably simple system that achieves competitive results when compared to the existing methods. We use K-means clustering algorithm for selecting data with correct labels and train the new dataset using a deep CNN model. The technique achieves competitive results in both training and validation stages. We conducted experiments using MNIST database of handwritten digits with 50% corrupted labels and achieved up to 10 and 20% increase in training and validation sets accuracy scores, respectively.

가상 환경에서의 딥러닝 기반 폐색영역 검출을 위한 데이터베이스 구축 (Construction of Database for Deep Learning-based Occlusion Area Detection in the Virtual Environment)

  • 김경수;이재인;곽석우;강원율;신대영;황성호
    • 드라이브 ㆍ 컨트롤
    • /
    • 제19권3호
    • /
    • pp.9-15
    • /
    • 2022
  • This paper proposes a method for constructing and verifying datasets used in deep learning technology, to prevent safety accidents in automated construction machinery or autonomous vehicles. Although open datasets for developing image recognition technologies are challenging to meet requirements desired by users, this study proposes the interface of virtual simulators to facilitate the creation of training datasets desired by users. The pixel-level training image dataset was verified by creating scenarios, including various road types and objects in a virtual environment. Detecting an object from an image may interfere with the accurate path determination due to occlusion areas covered by another object. Thus, we construct a database, for developing an occlusion area detection algorithm in a virtual environment. Additionally, we present the possibility of its use as a deep learning dataset to calculate a grid map, that enables path search considering occlusion areas. Custom datasets are built using the RDBMS system.

Crop Leaf Disease Identification Using Deep Transfer Learning

  • Changjian Zhou;Yutong Zhang;Wenzhong Zhao
    • Journal of Information Processing Systems
    • /
    • 제20권2호
    • /
    • pp.149-158
    • /
    • 2024
  • Traditional manual identification of crop leaf diseases is challenging. Owing to the limitations in manpower and resources, it is challenging to explore crop diseases on a large scale. The emergence of artificial intelligence technologies, particularly the extensive application of deep learning technologies, is expected to overcome these challenges and greatly improve the accuracy and efficiency of crop disease identification. Crop leaf disease identification models have been designed and trained using large-scale training data, enabling them to predict different categories of diseases from unlabeled crop leaves. However, these models, which possess strong feature representation capabilities, require substantial training data, and there is often a shortage of such datasets in practical farming scenarios. To address this issue and improve the feature learning abilities of models, this study proposes a deep transfer learning adaptation strategy. The novel proposed method aims to transfer the weights and parameters from pre-trained models in similar large-scale training datasets, such as ImageNet. ImageNet pre-trained weights are adopted and fine-tuned with the features of crop leaf diseases to improve prediction ability. In this study, we collected 16,060 crop leaf disease images, spanning 12 categories, for training. The experimental results demonstrate that an impressive accuracy of 98% is achieved using the proposed method on the transferred ResNet-50 model, thereby confirming the effectiveness of our transfer learning approach.

샴 네트워크를 사용하여 추적 레이블을 사용하지 않는 다중 객체 검출 및 추적기 학습에 관한 연구 (Training of a Siamese Network to Build a Tracker without Using Tracking Labels)

  • 강정규;송유승;민경욱;최정단
    • 한국ITS학회 논문지
    • /
    • 제21권5호
    • /
    • pp.274-286
    • /
    • 2022
  • 이동객체 추적은 컴퓨터 비전 분야에서 오랜 시간 동안 연구가 진행되어 온 분야로 자율주행이나 운전 보조 시스템 등의 시스템에서 아주 중요한 역할을 수행하고 있다. 이동객체 추적 기술은 일반적으로 객체를 검출하는 검출기와 검출된 객체를 추적하는 추적기의 결합으로 이루어져 있다. 검출기는 다양한 데이터셋이 공개되어 사용되고 있기 때문에 쉽게 좋은 모델을 학습할 수 있지만, 추적기의 경우 상대적으로 공개된 데이터셋도 적고 직접 데이터셋을 구성하는 것도 검출기 데이터셋에 비해 굉장히 오랜 시간을 소요한다. 이에 검출기를 따로 개발하고, 별도의 추적기를 학습 기반이 아닌 방식을 활용하여 개발하는 경우가 많은데 이런 경우 두 개의 시스템이 차례로 작동하게 되어 전체 시스템의 속도를 느리게 하고 앞단의 검출기의 성능이 변할 때마다 별도로 추적기 또한 조정해줘야 한다는 단점이 있다. 이에 본 연구는 검출용 데이터셋만을 사용하여 검출과 추적을 동시에 수행하는 모델을 구성하는 방법을 제안한다. 데이터 증강 기술과 샴 네트워크를 사용하여 단일 이미지에서 객체를 검출 및 추적하는 방법을 연구하였다. 공개 데이터셋에 실험을 진행하여 학습 결과 높은 속도로 작동하는 이동객체 검출 및 추적기를 학습할 수 있음을 검증하였다.

Reconstructing the cosmic density field based on the generative adversarial network.

  • Shi, Feng
    • 천문학회보
    • /
    • 제45권1호
    • /
    • pp.50.1-50.1
    • /
    • 2020
  • In this topic, I will introduce a recent work on reconstructing the cosmic density field based on the GAN. I will show the performance of the GAN compared to the traditional Unet architecture. I'd also like to discuss a 3-channels-based 2D datasets for the training to recover the 3D density field. Finally, I will present some performance tests based on the test datasets.

  • PDF

Development of Tourism Information Named Entity Recognition Datasets for the Fine-tune KoBERT-CRF Model

  • Jwa, Myeong-Cheol;Jwa, Jeong-Woo
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제14권2호
    • /
    • pp.55-62
    • /
    • 2022
  • A smart tourism chatbot is needed as a user interface to efficiently provide smart tourism services such as recommended travel products, tourist information, my travel itinerary, and tour guide service to tourists. We have been developed a smart tourism app and a smart tourism information system that provide smart tourism services to tourists. We also developed a smart tourism chatbot service consisting of khaiii morpheme analyzer, rule-based intention classification, and tourism information knowledge base using Neo4j graph database. In this paper, we develop the Korean and English smart tourism Name Entity (NE) datasets required for the development of the NER model using the pre-trained language models (PLMs) for the smart tourism chatbot system. We create the tourism information NER datasets by collecting source data through smart tourism app, visitJeju web of Jeju Tourism Organization (JTO), and web search, and preprocessing it using Korean and English tourism information Name Entity dictionaries. We perform training on the KoBERT-CRF NER model using the developed Korean and English tourism information NER datasets. The weight-averaged precision, recall, and f1 scores are 0.94, 0.92 and 0.94 on Korean and English tourism information NER datasets.

자동-레이블링 기반 영상 학습데이터 제작 시스템 (An Auto-Labeling based Smart Image Annotation System)

  • 이용;장래영;박민우;이건우;최명석
    • 한국콘텐츠학회논문지
    • /
    • 제21권6호
    • /
    • pp.701-715
    • /
    • 2021
  • 최근 딥러닝 기술의 급속한 발전과 함께 학습데이터가 크게 주목을 받고 있다. 일반적으로 딥러닝 방식에서는 모델을 훈련시키기 위해 충분한 학습데이터가 준비되어 있어야 한다. 하지만, 딥러닝 모델 설계 작업과 달리 데이터셋을 제작하는 데 상당한 시간과 노력이 필요하다. 영상 데이터를 주로 다루는 시각지능 분야에서도 학습데이터 제작자들은 전문적인 학습데이터 제작 도구를 사용해 이미지 단위로 레이블링을 수작업으로 하고 있어 여전히 많은 시간과 노력이 필요한 상황이다. 따라서, 다양한 분야에서 필요한 충분한 영상 학습데이터셋을 확보하기 위해 기존의 수작업 방식을 대체할 수 있는 레이블링 기술이 필요하다. 본 논문에서는, 영상 학습데이터셋 동향을 소개하고, 학습데이터 제작 환경에 대해 분석한다 특히, 수작업으로 이루어지는 반복적이고 수고스러운 레이블링 과정을 자동화하여, '확인과 수정'의 단계를 비약적으로 단축시킬 수 있는 '스마트 영상학습데이터 제작 시스템'을 제안한다. 그리고, 실험을 통해 영상 학습데이터 제작 과정에서 이미지에 박스형 및 폴리곤형 객체영역을 지정하여 레이블링하는 데 소요되는 시간을 크게 줄이기 위한 자동레이블링 방식의 효과를 검증한다. 마지막으로, 제안하는 시스템의 실험에서 추가적으로 검증되어야 하는 부분과 함께 이를 개선하기 위한 향후 연구 계획에 대해 논의한다.

Training-Free Fuzzy Logic Based Human Activity Recognition

  • Kim, Eunju;Helal, Sumi
    • Journal of Information Processing Systems
    • /
    • 제10권3호
    • /
    • pp.335-354
    • /
    • 2014
  • The accuracy of training-based activity recognition depends on the training procedure and the extent to which the training dataset comprehensively represents the activity and its varieties. Additionally, training incurs substantial cost and effort in the process of collecting training data. To address these limitations, we have developed a training-free activity recognition approach based on a fuzzy logic algorithm that utilizes a generic activity model and an associated activity semantic knowledge. The approach is validated through experimentation with real activity datasets. Results show that the fuzzy logic based algorithms exhibit comparable or better accuracy than other training-based approaches.