• Title/Summary/Keyword: Neural network Transformer

Search Result 109, Processing Time 0.021 seconds

Diagnosis of Transform Aging using Discrete Wavelet Analysis and Neural Network (이산 웨이블렛 분석과 신경망을 이용한 변압기 열화의 전단)

  • 박재준;윤만영;오승헌;김진승;김성홍;백관현;송영철;권동진
    • Proceedings of the Korean Institute of Electrical and Electronic Material Engineers Conference
    • /
    • 2000.07a
    • /
    • pp.645-650
    • /
    • 2000
  • The discrete wavelet transform is utilized as processing of neural network(NN) to identifying aging state of internal partial discharge in transformer. The discrete wavelet transform is used to produce wavelet coefficients which are used for classification. The mean values of the wavelet coefficients are input into an back-propagation neural network. The networks, after training, can decide if the test signals is aging early state or aging last state, or normal state.

  • PDF

Partial Discharge Pattern Recognition of Cast Resin Current Transformers Using Radial Basis Function Neural Network

  • Chang, Wen-Yeau
    • Journal of Electrical Engineering and Technology
    • /
    • v.9 no.1
    • /
    • pp.293-300
    • /
    • 2014
  • This paper proposes a novel pattern recognition approach based on the radial basis function (RBF) neural network for identifying insulation defects of high-voltage electrical apparatus arising from partial discharge (PD). Pattern recognition of PD is used for identifying defects causing the PD, such as internal discharge, external discharge, corona, etc. This information is vital for estimating the harmfulness of the discharge in the insulation. Since an insulation defect, such as one resulting from PD, would have a corresponding particular pattern, pattern recognition of PD is significant means to discriminate insulation conditions of high-voltage electrical apparatus. To verify the proposed approach, experiments were conducted to demonstrate the field-test PD pattern recognition of cast resin current transformer (CRCT) models. These tests used artificial defects created in order to produce the common PD activities of CRCTs by using feature vectors of field-test PD patterns. The significant features are extracted by using nonlinear principal component analysis (NLPCA) method. The experimental data are found to be in close agreement with the recognized data. The test results show that the proposed approach is efficient and reliable.

Hyperparameter experiments on end-to-end automatic speech recognition

  • Yang, Hyungwon;Nam, Hosung
    • Phonetics and Speech Sciences
    • /
    • v.13 no.1
    • /
    • pp.45-51
    • /
    • 2021
  • End-to-end (E2E) automatic speech recognition (ASR) has achieved promising performance gains with the introduced self-attention network, Transformer. However, due to training time and the number of hyperparameters, finding the optimal hyperparameter set is computationally expensive. This paper investigates the impact of hyperparameters in the Transformer network to answer two questions: which hyperparameter plays a critical role in the task performance and training speed. The Transformer network for training has two encoder and decoder networks combined with Connectionist Temporal Classification (CTC). We have trained the model with Wall Street Journal (WSJ) SI-284 and tested on devl93 and eval92. Seventeen hyperparameters were selected from the ESPnet training configuration, and varying ranges of values were used for experiments. The result shows that "num blocks" and "linear units" hyperparameters in the encoder and decoder networks reduce Word Error Rate (WER) significantly. However, performance gain is more prominent when they are altered in the encoder network. Training duration also linearly increased as "num blocks" and "linear units" hyperparameters' values grow. Based on the experimental results, we collected the optimal values from each hyperparameter and reduced the WER up to 2.9/1.9 from dev93 and eval93 respectively.

Lightening of Human Pose Estimation Algorithm Using MobileViT and Transfer Learning

  • Kunwoo Kim;Jonghyun Hong;Jonghyuk Park
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.9
    • /
    • pp.17-25
    • /
    • 2023
  • In this paper, we propose a model that can perform human pose estimation through a MobileViT-based model with fewer parameters and faster estimation. The based model demonstrates lightweight performance through a structure that combines features of convolutional neural networks with features of Vision Transformer. Transformer, which is a major mechanism in this study, has become more influential as its based models perform better than convolutional neural network-based models in the field of computer vision. Similarly, in the field of human pose estimation, Vision Transformer-based ViTPose maintains the best performance in all human pose estimation benchmarks such as COCO, OCHuman, and MPII. However, because Vision Transformer has a heavy model structure with a large number of parameters and requires a relatively large amount of computation, it costs users a lot to train the model. Accordingly, the based model overcame the insufficient Inductive Bias calculation problem, which requires a large amount of computation by Vision Transformer, with Local Representation through a convolutional neural network structure. Finally, the proposed model obtained a mean average precision of 0.694 on the MS COCO benchmark with 3.28 GFLOPs and 9.72 million parameters, which are 1/5 and 1/9 the number compared to ViTPose, respectively.

Real-Time Fire Detection Method Using YOLOv8 (YOLOv8을 이용한 실시간 화재 검출 방법)

  • Tae Hee Lee;Chun-Su Park
    • Journal of the Semiconductor & Display Technology
    • /
    • v.22 no.2
    • /
    • pp.77-80
    • /
    • 2023
  • Since fires in uncontrolled environments pose serious risks to society and individuals, many researchers have been investigating technologies for early detection of fires that occur in everyday life. Recently, with the development of deep learning vision technology, research on fire detection models using neural network backbones such as Transformer and Convolution Natural Network has been actively conducted. Vision-based fire detection systems can solve many problems with physical sensor-based fire detection systems. This paper proposes a fire detection method using the latest YOLOv8, which improves the existing fire detection method. The proposed method develops a system that detects sparks and smoke from input images by training the Yolov8 model using a universal fire detection dataset. We also demonstrate the superiority of the proposed method through experiments by comparing it with existing methods.

  • PDF

U-net with vision transformer encoder for polyp segmentation in colonoscopy images (비전 트랜스포머 인코더가 포함된 U-net을 이용한 대장 내시경 이미지의 폴립 분할)

  • Ayana, Gelan;Choe, Se-woon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.97-99
    • /
    • 2022
  • For the early identification and treatment of colorectal cancer, accurate polyp segmentation is crucial. However, polyp segmentation is a challenging task, and the majority of current approaches struggle with two issues. First, the position, size, and shape of each individual polyp varies greatly (intra-class inconsistency). Second, there is a significant degree of similarity between polyps and their surroundings under certain circumstances, such as motion blur and light reflection (inter-class indistinction). U-net, which is composed of convolutional neural networks as encoder and decoder, is considered as a standard for tackling this task. We propose an updated U-net architecture replacing the encoder part with vision transformer network for polyp segmentation. The proposed architecture performed better than the standard U-net architecture for the task of polyp segmentation.

  • PDF

A Study on Auto-Classification of Acoustic Emission Signals Using Wavelet Transform and Neural Network (웨이블렛 변환과 신경망을 이용한 음향방출신호의 자동분류에 관한연구)

  • Park, Jae-Jun;Kim, Meyoun-Soo;Oh, Seung-Heon;Kang, Tae-Rim;Kim, Sung-Hong;Beak, Kwan-Hyun;Oh, Il-Duck;Song, Young-Chul;Kwon, Dong-Jin
    • Proceedings of the KIEE Conference
    • /
    • 2000.07c
    • /
    • pp.1880-1884
    • /
    • 2000
  • The discrete wavelet transform is utilized as preprocessing of Neural Network(NN) to identify aging state of internal partial discharge in transformer. The discrete traveler transform is used to produce wavelet coefficients which are used for Classification. The statistical parameters (maximum of wavelet coefficients, average value, dispersion, skewness, kurtosis) using the wavelet coefficients are input into an back-propagation neural network. The neurons whose weights have obtained through Result of Cross-Validation. The Neural Network learning stops either when the error rate achieves an appropriate minimum or when the learning time overcomes a constant value. The networks, after training, can decide if the test signal is Early Aging State or Last Aging State or normal state.

  • PDF

A Comparison of Deep Neural Network Structures for Learning Various Motions (다양한 동작 학습을 위한 깊은신경망 구조 비교)

  • Park, Soohwan;Lee, Jehee
    • Journal of the Korea Computer Graphics Society
    • /
    • v.27 no.5
    • /
    • pp.73-79
    • /
    • 2021
  • Recently, in the field of computer animation, a method for generating motion using deep learning has been studied away from conventional finite-state machines or graph-based methods. The expressiveness of the network required for learning motions is more influenced by the diversity of motion contained in it than by the simple length of motion to be learned. This study aims to find an efficient network structure when the types of motions to be learned are diverse. In this paper, we train and compare three types of networks: basic fully-connected structure, mixture of experts structure that uses multiple fully-connected layers in parallel, recurrent neural network which is widely used to deal with seq2seq, and transformer structure used for sequence-type data processing in the natural language processing field.

Kidney Tumor Segmentation Using a Hybrid CNN-Transformer Network for Partial Nephrectomy Planning (부분 신장 절제술 계획을 위한 하이브리드 CNN-트랜스포머 네트워크를 활용한 신장 종양 분할)

  • Goun Kim;Jinseo An;Yubeen Lee;Helen Hong
    • Journal of the Korea Computer Graphics Society
    • /
    • v.30 no.4
    • /
    • pp.11-18
    • /
    • 2024
  • In partial nephrectomy for kidney cancer treatment, accurate segmentation of the kidney tumor is crucial for surgical planning, as it provides essential information on the precise size and location of the tumor. However, it is challenging due to the tumor's similar intensity to surrounding organs and the variability in its location and size across patients. In this study, we propose a hybrid network that integrates a convolutional neural network and a transformer to capture both local and global features, aiming to improve the segmentation performance of kidney tumors. We validated our method through comparative experiments with UNETR++, outperforming it with a Dice Similarity Coefficient (DSC) of 78.54% and a precision of 85.0 7%. Moreover, in the analysis by tumor size, our method demonstrated improvements by reducing over-segmentation and outlier cases observed in UNETR++.

MLSE-Net: Multi-level Semantic Enriched Network for Medical Image Segmentation

  • Di Gai;Heng Luo;Jing He;Pengxiang Su;Zheng Huang;Song Zhang;Zhijun Tu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.9
    • /
    • pp.2458-2482
    • /
    • 2023
  • Medical image segmentation techniques based on convolution neural networks indulge in feature extraction triggering redundancy of parameters and unsatisfactory target localization, which outcomes in less accurate segmentation results to assist doctors in diagnosis. In this paper, we propose a multi-level semantic-rich encoding-decoding network, which consists of a Pooling-Conv-Former (PCFormer) module and a Cbam-Dilated-Transformer (CDT) module. In the PCFormer module, it is used to tackle the issue of parameter explosion in the conservative transformer and to compensate for the feature loss in the down-sampling process. In the CDT module, the Cbam attention module is adopted to highlight the feature regions by blending the intersection of attention mechanisms implicitly, and the Dilated convolution-Concat (DCC) module is designed as a parallel concatenation of multiple atrous convolution blocks to display the expanded perceptual field explicitly. In addition, MultiHead Attention-DwConv-Transformer (MDTransformer) module is utilized to evidently distinguish the target region from the background region. Extensive experiments on medical image segmentation from Glas, SIIM-ACR, ISIC and LGG demonstrated that our proposed network outperforms existing advanced methods in terms of both objective evaluation and subjective visual performance.