• Title/Summary/Keyword: 트랜스포머 모델

Search Result 120, Processing Time 0.024 seconds

Lip and Voice Synchronization Using Visual Attention (시각적 어텐션을 활용한 입술과 목소리의 동기화 연구)

  • Dongryun Yoon;Hyeonjoong Cho
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.4
    • /
    • pp.166-173
    • /
    • 2024
  • This study explores lip-sync detection, focusing on the synchronization between lip movements and voices in videos. Typically, lip-sync detection techniques involve cropping the facial area of a given video, utilizing the lower half of the cropped box as input for the visual encoder to extract visual features. To enhance the emphasis on the articulatory region of lips for more accurate lip-sync detection, we propose utilizing a pre-trained visual attention-based encoder. The Visual Transformer Pooling (VTP) module is employed as the visual encoder, originally designed for the lip-reading task, predicting the script based solely on visual information without audio. Our experimental results demonstrate that, despite having fewer learning parameters, our proposed method outperforms the latest model, VocaList, on the LRS2 dataset, achieving a lip-sync detection accuracy of 94.5% based on five context frames. Moreover, our approach exhibits an approximately 8% superiority over VocaList in lip-sync detection accuracy, even on an untrained dataset, Acappella.

Comparative Analysis of Self-supervised Deephashing Models for Efficient Image Retrieval System (효율적인 이미지 검색 시스템을 위한 자기 감독 딥해싱 모델의 비교 분석)

  • Kim Soo In;Jeon Young Jin;Lee Sang Bum;Kim Won Gyum
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.12
    • /
    • pp.519-524
    • /
    • 2023
  • In hashing-based image retrieval, the hash code of a manipulated image is different from the original image, making it difficult to search for the same image. This paper proposes and evaluates a self-supervised deephashing model that generates perceptual hash codes from feature information such as texture, shape, and color of images. The comparison models are autoencoder-based variational inference models, but the encoder is designed with a fully connected layer, convolutional neural network, and transformer modules. The proposed model is a variational inference model that includes a SimAM module of extracting geometric patterns and positional relationships within images. The SimAM module can learn latent vectors highlighting objects or local regions through an energy function using the activation values of neurons and surrounding neurons. The proposed method is a representation learning model that can generate low-dimensional latent vectors from high-dimensional input images, and the latent vectors are binarized into distinguishable hash code. From the experimental results on public datasets such as CIFAR-10, ImageNet, and NUS-WIDE, the proposed model is superior to the comparative model and analyzed to have equivalent performance to the supervised learning-based deephashing model. The proposed model can be used in application systems that require low-dimensional representation of images, such as image search or copyright image determination.

A Study on the Design of Prediction Model for Safety Evaluation of Partial Discharge (부분 방전의 안전도 평가를 위한 예측 모델 설계)

  • Lee, Su-Il;Ko, Dae-Sik
    • Journal of Platform Technology
    • /
    • v.8 no.3
    • /
    • pp.10-21
    • /
    • 2020
  • Partial discharge occurs a lot in high-voltage power equipment such as switchgear, transformers, and switch gears. Partial discharge shortens the life of the insulator and causes insulation breakdown, resulting in large-scale damage such as a power outage. There are several types of partial discharge that occur inside the product and the surface. In this paper, we design a predictive model that can predict the pattern and probability of occurrence of partial discharge. In order to analyze the designed model, learning data for each type of partial discharge was collected through the UHF sensor by using a simulator that generates partial discharge. The predictive model designed in this paper was designed based on CNN during deep learning, and the model was verified through learning. To learn about the designed model, 5000 training data were created, and the form of training data was used as input data for the model by pre-processing the 3D raw data input from the UHF sensor as 2D data. As a result of the experiment, it was found that the accuracy of the model designed through learning has an accuracy of 0.9972. It was found that the accuracy of the proposed model was higher in the case of learning by making the data into a two-dimensional image and learning it in the form of a grayscale image.

  • PDF

Lightening of Human Pose Estimation Algorithm Using MobileViT and Transfer Learning

  • Kunwoo Kim;Jonghyun Hong;Jonghyuk Park
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.9
    • /
    • pp.17-25
    • /
    • 2023
  • In this paper, we propose a model that can perform human pose estimation through a MobileViT-based model with fewer parameters and faster estimation. The based model demonstrates lightweight performance through a structure that combines features of convolutional neural networks with features of Vision Transformer. Transformer, which is a major mechanism in this study, has become more influential as its based models perform better than convolutional neural network-based models in the field of computer vision. Similarly, in the field of human pose estimation, Vision Transformer-based ViTPose maintains the best performance in all human pose estimation benchmarks such as COCO, OCHuman, and MPII. However, because Vision Transformer has a heavy model structure with a large number of parameters and requires a relatively large amount of computation, it costs users a lot to train the model. Accordingly, the based model overcame the insufficient Inductive Bias calculation problem, which requires a large amount of computation by Vision Transformer, with Local Representation through a convolutional neural network structure. Finally, the proposed model obtained a mean average precision of 0.694 on the MS COCO benchmark with 3.28 GFLOPs and 9.72 million parameters, which are 1/5 and 1/9 the number compared to ViTPose, respectively.

Class-Agnostic 3D Mask Proposal and 2D-3D Visual Feature Ensemble for Efficient Open-Vocabulary 3D Instance Segmentation (효율적인 개방형 어휘 3차원 개체 분할을 위한 클래스-독립적인 3차원 마스크 제안과 2차원-3차원 시각적 특징 앙상블)

  • Sungho Song;Kyungmin Park;Incheol Kim
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.7
    • /
    • pp.335-347
    • /
    • 2024
  • Open-vocabulary 3D point cloud instance segmentation (OV-3DIS) is a challenging visual task to segment a 3D scene point cloud into object instances of both base and novel classes. In this paper, we propose a novel model Open3DME for OV-3DIS to address important design issues and overcome limitations of the existing approaches. First, in order to improve the quality of class-agnostic 3D masks, our model makes use of T3DIS, an advanced Transformer-based 3D point cloud instance segmentation model, as mask proposal module. Second, in order to obtain semantically text-aligned visual features of each point cloud segment, our model extracts both 2D and 3D features from the point cloud and the corresponding multi-view RGB images by using pretrained CLIP and OpenSeg encoders respectively. Last, to effectively make use of both 2D and 3D visual features of each point cloud segment during label assignment, our model adopts a unique feature ensemble method. To validate our model, we conducted both quantitative and qualitative experiments on ScanNet-V2 benchmark dataset, demonstrating significant performance gains.

Research of weighting circuits for beamforming using speaker array II (스피커 어레이를 이용한 빔형성 가중회로 구성실험에 관한 연구 II)

  • Seo Jeong-Hun;Choi Nakjin;Lim Jun-Seok;Sung Koeng-Mo
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.463-466
    • /
    • 2004
  • 수중 음향 탐지 시스템에서 빔 형성 기법은 오랫동안 많은 연구자들에 의해 연구되어 왔다. 빔 형성 기법은 탐지성능에 직결되기 때문에 최적의 빔 설계는 중요한 문제가 된다. 그리고 최적의 빔 형성기를 구현하기 위해서는 개별 센서에 대한 가중회로의 구성이 필수적이며, 가중회로를 구성하기 위해서는 개별 센서에 대한 등가회로 모델링과 정합회로 설계가 필수적이다. 이전의 연구에서는 센서 등가회로 모델링 Tool과 정합회로 디자인 Tool 각각의 구성과 사용방법에 대해서 소개하였다. 이 두 가지 Tool을 모두 이용하여 센서 등가회로 및 정합회로를 모델링하였고, 이를 바탕으로 실제 가중회로를 구현하였다. 가중회로는 정합회로와 트랜스포머로 구성된다. 본 논문에서는 이와같이 구성된 가중회로와 스케일 모델에 맞게 제작된 스피커 어레이를 이용하여 빔형성 실험을 하였으며, 그 결과를 이론치와 비교하였다. 이것을 바탕으로 BeamCAD, 센서 등가회로 모델링 Tool 그리고 정합회로 디자인 Tool로 이루어진 소나 센서 디자인 Tool의 타당성을 검증하였다.

  • PDF

High Switching Frequency and High Power Density Three-Level LLC Resonant Converter using Integrated Magnetics (Integrated Magnetics를 적용한 고속 스위칭 및 고전력밀도 3 레벨 LLC 공진형 컨버터)

  • Nam, Kyung-Hoon;Park, Chul-Wan;Bae, Ji-Hun;Ji, Sang-Keun;Ryu, Dong-Kyun;Choi, Heung-Gyoon;Han, Sang-Kyoo
    • Proceedings of the KIPE Conference
    • /
    • 2017.07a
    • /
    • pp.429-430
    • /
    • 2017
  • 본 논문은 Integrated Magnetics(IM)을 적용한 3-레벨 LLC 공진형 컨버터를 제안한다. 제안된 3-레벨 LLC 공진형 컨버터는 스위치 내압이 입력전압의 절반으로 보장되므로 스위칭 손실을 대폭 저감할 수 있어 고주파수 구동에 유리하다. 이에 따라 제안 회로는 리액티브 소자 저감에 유리하나, 회로 동작 상 2개의 공진 인덕터와 1개의 트랜스포머가 요구되는 단점이 있다. 이를 위해 본 논문에서는 자화 인덕터로 공진 인덕터를 대체하는 동시에 모든 자기 소자를 하나로 통합할 수 있는 새로운 IM을 제안하고 그 타당성 검증을 위해 인덕턴스 모델을 통한 이론적 분석과 350W-800kHz급 시작품 제작을 통한 실험결과를 제시한다.

  • PDF

A Transformer-Based Emotion Classification Model Using Transfer Learning and SHAP Analysis (전이 학습 및 SHAP 분석을 활용한 트랜스포머 기반 감정 분류 모델)

  • Subeen Leem;Byeongcheon Lee;Insu Jeon;Jihoon Moon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.706-708
    • /
    • 2023
  • In this study, we embark on a journey to uncover the essence of emotions by exploring the depths of transfer learning on three pre-trained transformer models. Our quest to classify five emotions culminates in discovering the KLUE (Korean Language Understanding Evaluation)-BERT (Bidirectional Encoder Representations from Transformers) model, which is the most exceptional among its peers. Our analysis of F1 scores attests to its superior learning and generalization abilities on the experimental data. To delve deeper into the mystery behind its success, we employ the powerful SHAP (Shapley Additive Explanations) method to unravel the intricacies of the KLUE-BERT model. The findings of our investigation are presented with a mesmerizing text plot visualization, which serves as a window into the model's soul. This approach enables us to grasp the impact of individual tokens on emotion classification and provides irrefutable, visually appealing evidence to support the predictions of the KLUE-BERT model.

Deep Learning Models for Fabric Image Defect Detection: Experiments with Transformer-based Image Segmentation Models (직물 이미지 결함 탐지를 위한 딥러닝 기술 연구: 트랜스포머 기반 이미지 세그멘테이션 모델 실험)

  • Lee, Hyun Sang;Ha, Sung Ho;Oh, Se Hwan
    • The Journal of Information Systems
    • /
    • v.32 no.4
    • /
    • pp.149-162
    • /
    • 2023
  • Purpose In the textile industry, fabric defects significantly impact product quality and consumer satisfaction. This research seeks to enhance defect detection by developing a transformer-based deep learning image segmentation model for learning high-dimensional image features, overcoming the limitations of traditional image classification methods. Design/methodology/approach This study utilizes the ZJU-Leaper dataset to develop a model for detecting defects in fabrics. The ZJU-Leaper dataset includes defects such as presses, stains, warps, and scratches across various fabric patterns. The dataset was built using the defect labeling and image files from ZJU-Leaper, and experiments were conducted with deep learning image segmentation models including Deeplabv3, SegformerB0, SegformerB1, and Dinov2. Findings The experimental results of this study indicate that the SegformerB1 model achieved the highest performance with an mIOU of 83.61% and a Pixel F1 Score of 81.84%. The SegformerB1 model excelled in sensitivity for detecting fabric defect areas compared to other models. Detailed analysis of its inferences showed accurate predictions of diverse defects, such as stains and fine scratches, within intricated fabric designs.

Automatic Post Editing Research (기계번역 사후교정(Automatic Post Editing) 연구)

  • Park, Chan-Jun;Lim, Heui-Seok
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.5
    • /
    • pp.1-8
    • /
    • 2020
  • Machine translation refers to a system where a computer translates a source sentence into a target sentence. There are various subfields of machine translation. APE (Automatic Post Editing) is a subfield of machine translation that produces better translations by editing the output of machine translation systems. In other words, it means the process of correcting errors included in the translations generated by the machine translation system to make proofreading. Rather than changing the machine translation model, this is a research field to improve the translation quality by correcting the result sentence of the machine translation system. Since 2015, APE has been selected for the WMT Shaed Task. and the performance evaluation uses TER (Translation Error Rate). Due to this, various studies on the APE model have been published recently, and this paper deals with the latest research trends in the field of APE.