• Title/Summary/Keyword: Training Datasets

Search Result 364, Processing Time 0.031 seconds

Synthetic data augmentation for pixel-wise steel fatigue crack identification using fully convolutional networks

  • Zhai, Guanghao;Narazaki, Yasutaka;Wang, Shuo;Shajihan, Shaik Althaf V.;Spencer, Billie F. Jr.
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.237-250
    • /
    • 2022
  • Structural health monitoring (SHM) plays an important role in ensuring the safety and functionality of critical civil infrastructure. In recent years, numerous researchers have conducted studies to develop computer vision and machine learning techniques for SHM purposes, offering the potential to reduce the laborious nature and improve the effectiveness of field inspections. However, high-quality vision data from various types of damaged structures is relatively difficult to obtain, because of the rare occurrence of damaged structures. The lack of data is particularly acute for fatigue crack in steel bridge girder. As a result, the lack of data for training purposes is one of the main issues that hinders wider application of these powerful techniques for SHM. To address this problem, the use of synthetic data is proposed in this article to augment real-world datasets used for training neural networks that can identify fatigue cracks in steel structures. First, random textures representing the surface of steel structures with fatigue cracks are created and mapped onto a 3D graphics model. Subsequently, this model is used to generate synthetic images for various lighting conditions and camera angles. A fully convolutional network is then trained for two cases: (1) using only real-word data, and (2) using both synthetic and real-word data. By employing synthetic data augmentation in the training process, the crack identification performance of the neural network for the test dataset is seen to improve from 35% to 40% and 49% to 62% for intersection over union (IoU) and precision, respectively, demonstrating the efficacy of the proposed approach.

A Comparison of Meta-learning and Transfer-learning for Few-shot Jamming Signal Classification

  • Jin, Mi-Hyun;Koo, Ddeo-Ol-Ra;Kim, Kang-Suk
    • Journal of Positioning, Navigation, and Timing
    • /
    • v.11 no.3
    • /
    • pp.163-172
    • /
    • 2022
  • Typical anti-jamming technologies based on array antennas, Space Time Adaptive Process (STAP) & Space Frequency Adaptive Process (SFAP), are very effective algorithms to perform nulling and beamforming. However, it does not perform equally well for all types of jamming signals. If the anti-jamming algorithm is not optimized for each signal type, anti-jamming performance deteriorates and the operation stability of the system become worse by unnecessary computation. Therefore, jamming classification technique is required to obtain optimal anti-jamming performance. Machine learning, which has recently been in the spotlight, can be considered to classify jamming signal. In general, performing supervised learning for classification requires a huge amount of data and new learning for unfamiliar signal. In the case of jamming signal classification, it is difficult to obtain large amount of data because outdoor jamming signal reception environment is difficult to configure and the signal type of attacker is unknown. Therefore, this paper proposes few-shot jamming signal classification technique using meta-learning and transfer-learning to train the model using a small amount of data. A training dataset is constructed by anti-jamming algorithm input data within the GNSS receiver when jamming signals are applied. For meta-learning, Model-Agnostic Meta-Learning (MAML) algorithm with a general Convolution Neural Networks (CNN) model is used, and the same CNN model is used for transfer-learning. They are trained through episodic training using training datasets on developed our Python-based simulator. The results show both algorithms can be trained with less data and immediately respond to new signal types. Also, the performances of two algorithms are compared to determine which algorithm is more suitable for classifying jamming signals.

Detecting Greenhouses from the Planetscope Satellite Imagery Using the YOLO Algorithm (YOLO 알고리즘을 활용한 Planetscope 위성영상 기반 비닐하우스 탐지)

  • Seongsu KIM;Youn-In CHUNG;Yun-Jae CHOUNG
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.26 no.4
    • /
    • pp.27-39
    • /
    • 2023
  • Detecting greenhouses from the remote sensing datasets is useful in identifying the illegal agricultural facilities and predicting the agricultural output of the greenhouses. This research proposed a methodology for automatically detecting greenhouses from a given Planetscope satellite imagery acquired in the areas of Gimje City using the deep learning technique through a series of steps. First, multiple training images with a fixed size that contain the greenhouse features were generated from the five training Planetscope satellite imagery. Next, the YOLO(You Only Look Once) model was trained using the generated training images. Finally, the greenhouse features were detected from the input Planetscope satellite image. Statistical results showed that the 76.4% of the greenhouse features were detected from the input Planetscope satellite imagery by using the trained YOLO model. In future research, the high-resolution satellite imagery with a spatial resolution less than 1m should be used to detect more greenhouse features.

Evaluating the Impact of Training Conditions on the Performance of GPT-2-Small Based Korean-English Bilingual Models

  • Euhee Kim;Keonwoo Koo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.9
    • /
    • pp.69-77
    • /
    • 2024
  • This study evaluates the performance of second language acquisition models learning Korean and English using the GPT-2-Small model, analyzing the impact of various training conditions on performance. Four training conditions were used: monolingual learning, sequential learning, sequential-interleaved learning, and sequential-EWC learning. The model was trained using datasets from the National Institute of Korean Language and English from BabyLM Challenge, with performance measured through PPL and BLiMP metrics. Results showed that monolingual learning had the best performance with a PPL of 16.2 and BLiMP accuracy of 73.7%. In contrast, sequential-EWC learning had the highest PPL of 41.9 and the lowest BLiMP accuracy of 66.3%(p < 0.05). Monolingual learning proved most effective for optimizing model performance. The EWC regularization in sequential-EWC learning degraded performance by limiting weight updates, hindering new language learning. This research improves understanding of language modeling and contributes to cognitive similarity in AI language learning.

A Crowdsourcing-Based Paraphrased Opinion Spam Dataset and Its Implication on Detection Performance (크라우드소싱 기반 문장재구성 방법을 통한 의견 스팸 데이터셋 구축 및 평가)

  • Lee, Seongwoon;Kim, Seongsoon;Park, Donghyeon;Kang, Jaewoo
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.7
    • /
    • pp.338-343
    • /
    • 2016
  • Today, opinion reviews on the Web are often used as a means of information exchange. As the importance of opinion reviews continues to grow, the number of issues for opinion spam also increases. Even though many research studies on detecting spam reviews have been conducted, some limitations of gold-standard datasets hinder research. Therefore, we introduce a new dataset called "Paraphrased Opinion Spam (POS)" that contains a new type of review spam that imitates truthful reviews. We have noticed that spammers refer to existing truthful reviews to fabricate spam reviews. To create such a seemingly truthful review spam dataset, we asked task participants to paraphrase truthful reviews to create a new deceptive review. The experiment results show that classifying our POS dataset is more difficult than classifying the existing spam datasets since the reviews in our dataset more linguistically look like truthful reviews. Also, training volume has been found to be an important factor for classification model performance.

A new method for automatic areal feature matching based on shape similarity using CRITIC method (CRITIC 방법을 이용한 형상유사도 기반의 면 객체 자동매칭 방법)

  • Kim, Ji-Young;Huh, Yong;Kim, Doe-Sung;Yu, Ki-Yun
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.29 no.2
    • /
    • pp.113-121
    • /
    • 2011
  • In this paper, we proposed the method automatically to match areal feature based on similarity using spatial information. For this, we extracted candidate matching pairs intersected between two different spatial datasets, and then measured a shape similarity, which is calculated by an weight sum method of each matching criterion automatically derived from CRITIC method. In this time, matching pairs were selected when similarity is more than a threshold determined by outliers detection of adjusted boxplot from training data. After applying this method to two distinct spatial datasets: a digital topographic map and street-name address base map, we conformed that buildings were matched, that shape is similar and a large area is overlaid in visual evaluation, and F-Measure is highly 0.932 in statistical evaluation.

Performance Evaluation of Machine Learning Optimizers (기계학습 옵티마이저 성능 평가)

  • Joo, Gihun;Park, Chihyun;Im, Hyeonseung
    • Journal of IKEEE
    • /
    • v.24 no.3
    • /
    • pp.766-776
    • /
    • 2020
  • Recently, as interest in machine learning (ML) has increased and research using ML has become active, it is becoming more important to find an optimal hyperparameter combination for various ML models. In this paper, among various hyperparameters, we focused on ML optimizers, and measured and compared the performance of major optimizers using various datasets. In particular, we compared the performance of nine optimizers ranging from SGD, which is the most basic, to Momentum, NAG, AdaGrad, RMSProp, AdaDelta, Adam, AdaMax, and Nadam, using the MNIST, CIFAR-10, IRIS, TITANIC, and Boston Housing Price datasets. Experimental results showed that when Adam or Nadam was used, the loss of various ML models decreased most rapidly and their F1 score was also increased. Meanwhile, AdaMax showed a lot of instability during training and AdaDelta showed slower convergence speed and lower performance than other optimizers.

Outlier Detection in Time Series Monitoring Datasets using Rule Based and Correlation Analysis Method (규칙기반 및 상관분석 방법을 이용한 시계열 계측 데이터의 이상치 판정)

  • Jeon, Jesung;Koo, Jakap;Park, Changmok
    • Journal of the Korean GEO-environmental Society
    • /
    • v.16 no.5
    • /
    • pp.43-53
    • /
    • 2015
  • In this study, detection methods of outlier in various monitoring data that fit into big data category were developed and outlier detections were conducted for both artificial data and real field monitoring data. Rule-based methods applied rate of change and probability of error for monitoring data are effective to detect a large-scale short faults and constant faults having no change within a certain period. There are however, problems with misjudgement that consider the normal data with a large scale variation as outlier caused by using independent single dataset. Rule-based methods for noise faults detection have a limit to application of real monitoring data due to the problem with a choice of proper window size of data and finding of threshold for outlier judgment. A correlation analysis among different two datasets were very effective to detect localized outlier and abnormal variation for short and long-term monitoring dataset if reasonable range of training data could be selected.

Application and Performance Analysis of Double Pruning Method for Deep Neural Networks (심층신경망의 더블 프루닝 기법의 적용 및 성능 분석에 관한 연구)

  • Lee, Seon-Woo;Yang, Ho-Jun;Oh, Seung-Yeon;Lee, Mun-Hyung;Kwon, Jang-Woo
    • Journal of Convergence for Information Technology
    • /
    • v.10 no.8
    • /
    • pp.23-34
    • /
    • 2020
  • Recently, the artificial intelligence deep learning field has been hard to commercialize due to the high computing power and the price problem of computing resources. In this paper, we apply a double pruning techniques to evaluate the performance of the in-depth neural network and various datasets. Double pruning combines basic Network-slimming and Parameter-prunning. Our proposed technique has the advantage of reducing the parameters that are not important to the existing learning and improving the speed without compromising the learning accuracy. After training various datasets, the pruning ratio was increased to reduce the size of the model.We confirmed that MobileNet-V3 showed the highest performance as a result of NetScore performance analysis. We confirmed that the performance after pruning was the highest in MobileNet-V3 consisting of depthwise seperable convolution neural networks in the Cifar 10 dataset, and VGGNet and ResNet in traditional convolutional neural networks also increased significantly.

A Development of Façade Dataset Construction Technology Using Deep Learning-based Automatic Image Labeling (딥러닝 기반 이미지 자동 레이블링을 활용한 건축물 파사드 데이터세트 구축 기술 개발)

  • Gu, Hyeong-Mo;Seo, Ji-Hyo;Choo, Seung-Yeon
    • Journal of the Architectural Institute of Korea Planning & Design
    • /
    • v.35 no.12
    • /
    • pp.43-53
    • /
    • 2019
  • The construction industry has made great strides in the past decades by utilizing computer programs including CAD. However, compared to other manufacturing sectors, labor productivity is low due to the high proportion of workers' knowledge-based task in addition to simple repetitive task. Therefore, the knowledge-based task efficiency of workers should be improved by recognizing the visual information of computers. A computer needs a lot of training data, such as the ImageNet project, to recognize visual information. This study, aim at proposing building facade datasets that is efficiently constructed by quickly collecting building facade data through portal site road view and automatically labeling using deep learning as part of construction of image dataset for visual recognition construction by the computer. As a method proposed in this study, we constructed a dataset for a part of Dongseong-ro, Daegu Metropolitan City and analyzed the utility and reliability of the dataset. Through this, it was confirmed that the computer could extract the significant facade information of the portal site road view by recognizing the visual information of the building facade image. Additionally, In contribution to verifying the feasibility of building construction image datasets. this study suggests the possibility of securing quantitative and qualitative facade design knowledge by extracting the facade design knowledge from any facade all over the world.