• Title/Summary/Keyword: large dataset

Search Result 547, Processing Time 0.023 seconds

Machine Learning-based Screening Algorithm for Energy Storage System Using Retired Lithium-ion Batteries (에너지 저장 시스템 적용을 위한 머신러닝 기반의 폐배터리 스크리닝 알고리즘)

  • Han, Eui-Seong;Lim, Je-Yeong;Lee, Hyeon-Ho;Kim, Dong-Hwan;Noh, Tae-Won;Lee, Byoung-Kuk
    • The Transactions of the Korean Institute of Power Electronics
    • /
    • v.27 no.3
    • /
    • pp.265-274
    • /
    • 2022
  • This paper proposes a machine learning-based screening algorithm to build the retired battery pack of the energy storage system. The proposed algorithm creates the dataset of various performance parameters of the retired battery, and this dataset is preprocessed through a principal component analysis to reduce the overfitting problem. The retried batteries with a large deviation are excluded in the dataset through a density-based spatial clustering of applications with noise, and the K-means clustering method is formulated to select the group of the retired batteries to satisfy the deviation requirement conditions. The performance of the proposed algorithm is verified based on NASA and Oxford datasets.

Synthetic Image Dataset Generation for Defense using Generative Adversarial Networks (국방용 합성이미지 데이터셋 생성을 위한 대립훈련신경망 기술 적용 연구)

  • Yang, Hunmin
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.22 no.1
    • /
    • pp.49-59
    • /
    • 2019
  • Generative adversarial networks(GANs) have received great attention in the machine learning field for their capacity to model high-dimensional and complex data distribution implicitly and generate new data samples from the model distribution. This paper investigates the model training methodology, architecture, and various applications of generative adversarial networks. Experimental evaluation is also conducted for generating synthetic image dataset for defense using two types of GANs. The first one is for military image generation utilizing the deep convolutional generative adversarial networks(DCGAN). The other is for visible-to-infrared image translation utilizing the cycle-consistent generative adversarial networks(CycleGAN). Each model can yield a great diversity of high-fidelity synthetic images compared to training ones. This result opens up the possibility of using inexpensive synthetic images for training neural networks while avoiding the enormous expense of collecting large amounts of hand-annotated real dataset.

Performance analysis of deep learning-based automatic classification of upper endoscopic images according to data construction (딥러닝 기반 상부위장관 내시경 이미지 자동분류의 데이터 구성별 성능 분석 연구)

  • Seo, Jeong Min;Lim, Sang Heon;Kim, Yung Jae;Chung, Jun Won;Kim, Kwang Gi
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.3
    • /
    • pp.451-460
    • /
    • 2022
  • Recently, several deep learning studies have been reported to automatically identify the location of diagnostic devices using endoscopic data. In previous studies, there was no design to determine whether the configuration of the dataset resulted in differences in the accuracy in which artificial intelligence models perform image classification. Studies that are based on large amounts of data are likely to have different results depending on the composition of the dataset or its proportion. In this study, we intended to determine the existence and extent of accuracy according to the composition of the dataset by compiling it into three main types using larynx, esophagus, gastroscopy, and laryngeal endoscopy images.

Efficient Large Dataset Construction using Image Smoothing and Image Size Reduction

  • Jaemin HWANG;Sac LEE;Hyunwoo LEE;Seyun PARK;Jiyoung LIM
    • Korean Journal of Artificial Intelligence
    • /
    • v.11 no.1
    • /
    • pp.17-24
    • /
    • 2023
  • With the continuous growth in the amount of data collected and analyzed, deep learning has become increasingly popular for extracting meaningful insights from various fields. However, hardware limitations pose a challenge for achieving meaningful results with limited data. To address this challenge, this paper proposes an algorithm that leverages the characteristics of convolutional neural networks (CNNs) to reduce the size of image datasets by 20% through smoothing and shrinking the size of images using color elements. The proposed algorithm reduces the learning time and, as a result, the computational load on hardware. The experiments conducted in this study show that the proposed method achieves effective learning with similar or slightly higher accuracy than the original dataset while reducing computational and time costs. This color-centric dataset construction method using image smoothing techniques can lead to more efficient learning on CNNs. This method can be applied in various applications, such as image classification and recognition, and can contribute to more efficient and cost-effective deep learning. This paper presents a promising approach to reducing the computational load and time costs associated with deep learning and provides meaningful results with limited data, enabling them to apply deep learning to a broader range of applications.

Fast robust variable selection using VIF regression in large datasets (대형 데이터에서 VIF회귀를 이용한 신속 강건 변수선택법)

  • Seo, Han Son
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.4
    • /
    • pp.463-473
    • /
    • 2018
  • Variable selection algorithms for linear regression models of large data are considered. Many algorithms are proposed focusing on the speed and the robustness of algorithms. Among them variance inflation factor (VIF) regression is fast and accurate due to the use of a streamwise regression approach. But a VIF regression is susceptible to outliers because it estimates a model by a least-square method. A robust criterion using a weighted estimator has been proposed for the robustness of algorithm; in addition, a robust VIF regression has also been proposed for the same purpose. In this article a fast and robust variable selection method is suggested via a VIF regression with detecting and removing potential outliers. A simulation study and an analysis of a dataset are conducted to compare the suggested method with other methods.

Change Detection of Building Objects in Urban Area by Using Transfer Learning (전이학습을 활용한 도시지역 건물객체의 변화탐지)

  • Mo, Jun-sang;Seong, Seon-kyeong;Choi, Jae-wan
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.6_1
    • /
    • pp.1685-1695
    • /
    • 2021
  • To generate a deep learning model with high performance, a large training dataset should be required. However, it requires a lot of time and cost to generate a large training dataset in remote sensing. Therefore, the importance of transfer learning of deep learning model using a small dataset have been increased. In this paper, we performed transfer learning of trained model based on open datasets by using orthoimages and digital maps to detect changes of building objects in multitemporal orthoimages. For this, an initial training was performed on open dataset for change detection through the HRNet-v2 model, and transfer learning was performed on dataset by orthoimages and digital maps. To analyze the effect of transfer learning, change detection results of various deep learning models including deep learning model by transfer learning were evaluated at two test sites. In the experiments, results by transfer learning represented best accuracy, compared to those by other deep learning models. Therefore, it was confirmed that the problem of insufficient training dataset could be solved by using transfer learning, and the change detection algorithm could be effectively applied to various remote sensed imagery.

Sign Language Dataset Built from S. Korean Government Briefing on COVID-19 (대한민국 정부의 코로나 19 브리핑을 기반으로 구축된 수어 데이터셋 연구)

  • Sim, Hohyun;Sung, Horyeol;Lee, Seungjae;Cho, Hyeonjoong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.8
    • /
    • pp.325-330
    • /
    • 2022
  • This paper conducts the collection and experiment of datasets for deep learning research on sign language such as sign language recognition, sign language translation, and sign language segmentation for Korean sign language. There exist difficulties for deep learning research of sign language. First, it is difficult to recognize sign languages since they contain multiple modalities including hand movements, hand directions, and facial expressions. Second, it is the absence of training data to conduct deep learning research. Currently, KETI dataset is the only known dataset for Korean sign language for deep learning. Sign language datasets for deep learning research are classified into two categories: Isolated sign language and Continuous sign language. Although several foreign sign language datasets have been collected over time. they are also insufficient for deep learning research of sign language. Therefore, we attempted to collect a large-scale Korean sign language dataset and evaluate it using a baseline model named TSPNet which has the performance of SOTA in the field of sign language translation. The collected dataset consists of a total of 11,402 image and text. Our experimental result with the baseline model using the dataset shows BLEU-4 score 3.63, which would be used as a basic performance of a baseline model for Korean sign language dataset. We hope that our experience of collecting Korean sign language dataset helps facilitate further research directions on Korean sign language.

Remote Sensing Image Classification for Land Cover Mapping in Developing Countries: A Novel Deep Learning Approach

  • Lynda, Nzurumike Obianuju;Nnanna, Nwojo Agwu;Boukar, Moussa Mahamat
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.2
    • /
    • pp.214-222
    • /
    • 2022
  • Convolutional Neural networks (CNNs) are a category of deep learning networks that have proven very effective in computer vision tasks such as image classification. Notwithstanding, not much has been seen in its use for remote sensing image classification in developing countries. This is majorly due to the scarcity of training data. Recently, transfer learning technique has successfully been used to develop state-of-the art models for remote sensing (RS) image classification tasks using training and testing data from well-known RS data repositories. However, the ability of such model to classify RS test data from a different dataset has not been sufficiently investigated. In this paper, we propose a deep CNN model that can classify RS test data from a dataset different from the training dataset. To achieve our objective, we first, re-trained a ResNet-50 model using EuroSAT, a large-scale RS dataset to develop a base model then we integrated Augmentation and Ensemble learning to improve its generalization ability. We further experimented on the ability of this model to classify a novel dataset (Nig_Images). The final classification results shows that our model achieves a 96% and 80% accuracy on EuroSAT and Nig_Images test data respectively. Adequate knowledge and usage of this framework is expected to encourage research and the usage of deep CNNs for land cover mapping in cases of lack of training data as obtainable in developing countries.

Dog-Species Classification through CycleGAN and Standard Data Augmentation

  • Chan, Park;Nammee, Moon
    • Journal of Information Processing Systems
    • /
    • v.19 no.1
    • /
    • pp.67-79
    • /
    • 2023
  • In the image field, data augmentation refers to increasing the amount of data through an editing method such as rotating or cropping a photo. In this study, a generative adversarial network (GAN) image was created using CycleGAN, and various colors of dogs were reflected through data augmentation. In particular, dog data from the Stanford Dogs Dataset and Oxford-IIIT Pet Dataset were used, and 10 breeds of dog, corresponding to 300 images each, were selected. Subsequently, a GAN image was generated using CycleGAN, and four learning groups were established: 2,000 original photos (group I); 2,000 original photos + 1,000 GAN images (group II); 3,000 original photos (group III); and 3,000 original photos + 1,000 GAN images (group IV). The amount of data in each learning group was augmented using existing data augmentation methods such as rotating, cropping, erasing, and distorting. The augmented photo data were used to train the MobileNet_v3_Large, ResNet-152, InceptionResNet_v2, and NASNet_Large frameworks to evaluate the classification accuracy and loss. The top-3 accuracy for each deep neural network model was as follows: MobileNet_v3_Large of 86.4% (group I), 85.4% (group II), 90.4% (group III), and 89.2% (group IV); ResNet-152 of 82.4% (group I), 83.7% (group II), 84.7% (group III), and 84.9% (group IV); InceptionResNet_v2 of 90.7% (group I), 88.4% (group II), 93.3% (group III), and 93.1% (group IV); and NASNet_Large of 85% (group I), 88.1% (group II), 91.8% (group III), and 92% (group IV). The InceptionResNet_v2 model exhibited the highest image classification accuracy, and the NASNet_Large model exhibited the highest increase in the accuracy owing to data augmentation.

Automatic Extraction of Training Dataset Using Expectation Maximization Algorithm - for Automatic Supervised Classification of Road Networks (기대최대화 알고리즘을 활용한 도로노면 training 자료 자동추출에 관한 연구 - 감독분류를 통한 도로 네트워크의 자동추출을 위하여)

  • Han, You-Kyung;Choi, Jae-Wan;Lee, Jae-Bin;Yu, Ki-Yun;Kim, Yong-Il
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.27 no.2
    • /
    • pp.289-297
    • /
    • 2009
  • In the paper, we propose the methodology to extract training dataset automatically for supervised classification of road networks. For the preprocessing, we co-register the airborne photos, LIDAR data and large-scale digital maps and then, create orthophotos and intensity images. By overlaying the large-scale digital maps onto generated images, we can extract the initial training dataset for the supervised classification of road networks. However, the initial training information is distorted because there are errors propagated from registration process and, also, there are generally various objects in the road networks such as asphalt, road marks, vegetation, cars and so on. As such, to generate the training information only for the road surface, we apply the Expectation Maximization technique and finally, extract the training dataset of the road surface. For the accuracy test, we compare the training dataset with manually extracted ones. Through the statistical tests, we can identify that the developed method is valid.